LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users
@ 2019-05-20 14:20 Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 01/14 v2] function_graph: Convert ret_stack to a series of longs Steven Rostedt
                   ` (14 more replies)
  0 siblings, 15 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler


The background for this is explained in the V1 version found here:

 http://lkml.kernel.org/r/20181122012708.491151844@goodmis.org

The TL;DR; is this:

 The function graph tracer required a rewrite, mainly because it
 can only allow one callback registered at a time. The main motivation
 for this change is to allow kretprobes to use the code of function
 graph tracer, which should allow all archs that have function graph
 tracing to also have kretprobes with no extra work.

Masami told me that one requirement was to allow the function entry
callback to store data on the shadow stack that can be retrieved by
the the function return callback. I added this, as well as a per-task
variable (used by one of the function graph users).

The two functions to allow the storing of data on the stack and
retrieval of it are:

 void *fgraph_reserve_data(int size_in_bytes)

    Allows the entry function to reserve up to 4 words of data on
    the shadow stack. On success, a pointer to the contents is returned.
    This may be only called once per entry function.

 void *fgraph_retrieve_data(void)

    Allows the return function to retrieve the reserved data that was
    allocated by the entry function.

Note, this code has passed my full test suite.

Changes since v1:

  - Well, the first part of that series was already merged.
    But that was just the preparation for this part.

  - Allocate a page for the shadow stack split it up that way.
    When the stack is full, we stop allowing more to be added (stop tracing).

  - Added the reserve and retrieve of private data on the shadow stack
    for individual entry/return callbacks to pass data to each other.

  - Added a "per task" data that can be used by a fgraph_ops for all
    function callbacks for a specific task.

Steven Rostedt (VMware) (14):
      function_graph: Convert ret_stack to a series of longs
      function_graph: Add an array structure that will allow multiple callbacks
      function_graph: Allow multiple users to attach to function graph
      function_graph: Remove logic around ftrace_graph_entry and return
      ftrace/function_graph: Pass fgraph_ops to function graph callbacks
      ftrace: Allow function_graph tracer to be enabled in instances
      ftrace: Allow ftrace startup flags exist without dynamic ftrace
      function_graph: Have the instances use their own ftrace_ops for filtering
      function_graph: Add "task variables" per task for fgraph_ops
      function_graph: Move set_graph_function tests to shadow stack global var
      function_graph: Move graph depth stored data to shadow stack global var
      function_graph: Move graph notrace bit to shadow stack global var
      function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()
      function_graph: Add selftest for passing local variables

----
 include/linux/ftrace.h               |  37 +-
 include/linux/sched.h                |   2 +-
 kernel/trace/fgraph.c                | 862 ++++++++++++++++++++++++++++-------
 kernel/trace/ftrace.c                |  13 +-
 kernel/trace/ftrace_internal.h       |   2 -
 kernel/trace/trace.h                 | 132 +++---
 kernel/trace/trace_functions.c       |   7 +
 kernel/trace/trace_functions_graph.c |  96 ++--
 kernel/trace/trace_irqsoff.c         |  10 +-
 kernel/trace/trace_sched_wakeup.c    |  10 +-
 kernel/trace/trace_selftest.c        | 168 ++++++-
 11 files changed, 1048 insertions(+), 291 deletions(-)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 01/14 v2] function_graph: Convert ret_stack to a series of longs
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
@ 2019-05-20 14:20 ` Steven Rostedt
  2019-05-24 11:11   ` Peter Zijlstra
  2019-05-20 14:20 ` [RFC][PATCH 02/14 v2] function_graph: Add an array structure that will allow multiple callbacks Steven Rostedt
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

In order to make it possible to have multiple callbacks registered with the
function_graph tracer, the retstack needs to be converted from an array of
ftrace_ret_stack structures to an array of longs. This will allow to store
the list of callbacks on the stack for the return side of the functions.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/sched.h |   2 +-
 kernel/trace/fgraph.c | 124 ++++++++++++++++++++++++------------------
 2 files changed, 71 insertions(+), 55 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 11837410690f..1850d8a3c3f0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1113,7 +1113,7 @@ struct task_struct {
 	int				curr_ret_depth;
 
 	/* Stack of return addresses for return function tracing: */
-	struct ftrace_ret_stack		*ret_stack;
+	unsigned long			*ret_stack;
 
 	/* Timestamp for last schedule: */
 	unsigned long long		ftrace_timestamp;
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 8dfd5021b933..db0c387756a3 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -23,6 +23,18 @@
 #define ASSIGN_OPS_HASH(opsname, val)
 #endif
 
+#define FGRAPH_RET_SIZE (sizeof(struct ftrace_ret_stack))
+#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
+#define SHADOW_STACK_SIZE (PAGE_SIZE)
+#define SHADOW_STACK_INDEX			\
+	(ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
+/* Leave on a buffer at the end */
+#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
+
+#define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
+#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
+#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
+
 static bool kill_ftrace_graph;
 int ftrace_graph_active;
 
@@ -59,6 +71,7 @@ static int
 ftrace_push_return_trace(unsigned long ret, unsigned long func,
 			 unsigned long frame_pointer, unsigned long *retp)
 {
+	struct ftrace_ret_stack *ret_stack;
 	unsigned long long calltime;
 	int index;
 
@@ -75,23 +88,25 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
 	smp_rmb();
 
 	/* The return trace stack is full */
-	if (current->curr_ret_stack == FTRACE_RETFUNC_DEPTH - 1) {
+	if (current->curr_ret_stack >= SHADOW_STACK_MAX_INDEX) {
 		atomic_inc(&current->trace_overrun);
 		return -EBUSY;
 	}
 
 	calltime = trace_clock_local();
 
-	index = ++current->curr_ret_stack;
+	index = current->curr_ret_stack;
+	RET_STACK_INC(current->curr_ret_stack);
+	ret_stack = RET_STACK(current, index);
 	barrier();
-	current->ret_stack[index].ret = ret;
-	current->ret_stack[index].func = func;
-	current->ret_stack[index].calltime = calltime;
+	ret_stack->ret = ret;
+	ret_stack->func = func;
+	ret_stack->calltime = calltime;
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
-	current->ret_stack[index].fp = frame_pointer;
+	ret_stack->fp = frame_pointer;
 #endif
 #ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
-	current->ret_stack[index].retp = retp;
+	ret_stack->retp = retp;
 #endif
 	return 0;
 }
@@ -113,7 +128,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 
 	return 0;
  out_ret:
-	current->curr_ret_stack--;
+	RET_STACK_DEC(current->curr_ret_stack);
  out:
 	current->curr_ret_depth--;
 	return -EBUSY;
@@ -124,11 +139,13 @@ static void
 ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 			unsigned long frame_pointer)
 {
+	struct ftrace_ret_stack *ret_stack;
 	int index;
 
 	index = current->curr_ret_stack;
+	RET_STACK_DEC(index);
 
-	if (unlikely(index < 0 || index >= FTRACE_RETFUNC_DEPTH)) {
+	if (unlikely(index < 0 || index > SHADOW_STACK_MAX_INDEX)) {
 		ftrace_graph_stop();
 		WARN_ON(1);
 		/* Might as well panic, otherwise we have no where to go */
@@ -136,6 +153,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 		return;
 	}
 
+	ret_stack = RET_STACK(current, index);
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	/*
 	 * The arch may choose to record the frame pointer used
@@ -151,22 +169,22 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 	 * Note, -mfentry does not use frame pointers, and this test
 	 *  is not needed if CC_USING_FENTRY is set.
 	 */
-	if (unlikely(current->ret_stack[index].fp != frame_pointer)) {
+	if (unlikely(ret_stack->fp != frame_pointer)) {
 		ftrace_graph_stop();
 		WARN(1, "Bad frame pointer: expected %lx, received %lx\n"
 		     "  from func %ps return to %lx\n",
 		     current->ret_stack[index].fp,
 		     frame_pointer,
-		     (void *)current->ret_stack[index].func,
-		     current->ret_stack[index].ret);
+		     (void *)ret_stack->func,
+		     ret_stack->ret);
 		*ret = (unsigned long)panic;
 		return;
 	}
 #endif
 
-	*ret = current->ret_stack[index].ret;
-	trace->func = current->ret_stack[index].func;
-	trace->calltime = current->ret_stack[index].calltime;
+	*ret = ret_stack->ret;
+	trace->func = ret_stack->func;
+	trace->calltime = ret_stack->calltime;
 	trace->overrun = atomic_read(&current->trace_overrun);
 	trace->depth = current->curr_ret_depth--;
 	/*
@@ -220,7 +238,7 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 	 * curr_ret_stack is after that.
 	 */
 	barrier();
-	current->curr_ret_stack--;
+	RET_STACK_DEC(current->curr_ret_stack);
 
 	if (unlikely(!ret)) {
 		ftrace_graph_stop();
@@ -246,12 +264,13 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 struct ftrace_ret_stack *
 ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
 {
-	idx = task->curr_ret_stack - idx;
+	int index = task->curr_ret_stack;
 
-	if (idx >= 0 && idx <= task->curr_ret_stack)
-		return &task->ret_stack[idx];
+	index -= FGRAPH_RET_INDEX * (idx + 1);
+	if (index < 0)
+		return NULL;
 
-	return NULL;
+	return RET_STACK(task, index);
 }
 
 /**
@@ -273,18 +292,20 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 				    unsigned long ret, unsigned long *retp)
 {
+	struct ftrace_ret_stack *ret_stack;
 	int index = task->curr_ret_stack;
 	int i;
 
 	if (ret != (unsigned long)return_to_handler)
 		return ret;
 
-	if (index < 0)
-		return ret;
+	RET_STACK_DEC(index);
 
-	for (i = 0; i <= index; i++)
-		if (task->ret_stack[i].retp == retp)
-			return task->ret_stack[i].ret;
+	for (i = index; i >= 0; RET_STACK_DEC(i)) {
+		ret_stack = RET_STACK(task, i);
+		if (ret_stack->retp == retp)
+			return ret_stack->ret;
+	}
 
 	return ret;
 }
@@ -298,14 +319,15 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 		return ret;
 
 	task_idx = task->curr_ret_stack;
+	RET_STACK_DEC(task_idx);
 
 	if (!task->ret_stack || task_idx < *idx)
 		return ret;
 
 	task_idx -= *idx;
-	(*idx)++;
+	RET_STACK_INC(*idx);
 
-	return task->ret_stack[task_idx].ret;
+	return RET_STACK(task, task_idx);
 }
 #endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
 
@@ -339,7 +361,7 @@ trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
 static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
 
 /* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
-static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
+static int alloc_retstack_tasklist(unsigned long **ret_stack_list)
 {
 	int i;
 	int ret = 0;
@@ -347,10 +369,7 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
 	struct task_struct *g, *t;
 
 	for (i = 0; i < FTRACE_RETSTACK_ALLOC_SIZE; i++) {
-		ret_stack_list[i] =
-			kmalloc_array(FTRACE_RETFUNC_DEPTH,
-				      sizeof(struct ftrace_ret_stack),
-				      GFP_KERNEL);
+		ret_stack_list[i] = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
 		if (!ret_stack_list[i]) {
 			start = 0;
 			end = i;
@@ -369,9 +388,9 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
 		if (t->ret_stack == NULL) {
 			atomic_set(&t->tracing_graph_pause, 0);
 			atomic_set(&t->trace_overrun, 0);
-			t->curr_ret_stack = -1;
+			t->curr_ret_stack = 0;
 			t->curr_ret_depth = -1;
-			/* Make sure the tasks see the -1 first: */
+			/* Make sure the tasks see the 0 first: */
 			smp_wmb();
 			t->ret_stack = ret_stack_list[start++];
 		}
@@ -389,6 +408,7 @@ static void
 ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
 			struct task_struct *prev, struct task_struct *next)
 {
+	struct ftrace_ret_stack *ret_stack;
 	unsigned long long timestamp;
 	int index;
 
@@ -413,8 +433,11 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
 	 */
 	timestamp -= next->ftrace_timestamp;
 
-	for (index = next->curr_ret_stack; index >= 0; index--)
-		next->ret_stack[index].calltime += timestamp;
+	for (index = next->curr_ret_stack - FGRAPH_RET_INDEX; index >= 0; ) {
+		ret_stack = RET_STACK(next, index);
+		ret_stack->calltime += timestamp;
+		index -= FGRAPH_RET_INDEX;
+	}
 }
 
 static int ftrace_graph_entry_test(struct ftrace_graph_ent *trace)
@@ -457,10 +480,10 @@ void update_function_graph_func(void)
 		ftrace_graph_entry = __ftrace_graph_entry;
 }
 
-static DEFINE_PER_CPU(struct ftrace_ret_stack *, idle_ret_stack);
+static DEFINE_PER_CPU(unsigned long *, idle_ret_stack);
 
 static void
-graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
+graph_init_task(struct task_struct *t, unsigned long *ret_stack)
 {
 	atomic_set(&t->tracing_graph_pause, 0);
 	atomic_set(&t->trace_overrun, 0);
@@ -476,7 +499,7 @@ graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
  */
 void ftrace_graph_init_idle_task(struct task_struct *t, int cpu)
 {
-	t->curr_ret_stack = -1;
+	t->curr_ret_stack = 0;
 	t->curr_ret_depth = -1;
 	/*
 	 * The idle task has no parent, it either has its own
@@ -486,14 +509,11 @@ void ftrace_graph_init_idle_task(struct task_struct *t, int cpu)
 		WARN_ON(t->ret_stack != per_cpu(idle_ret_stack, cpu));
 
 	if (ftrace_graph_active) {
-		struct ftrace_ret_stack *ret_stack;
+		unsigned long *ret_stack;
 
 		ret_stack = per_cpu(idle_ret_stack, cpu);
 		if (!ret_stack) {
-			ret_stack =
-				kmalloc_array(FTRACE_RETFUNC_DEPTH,
-					      sizeof(struct ftrace_ret_stack),
-					      GFP_KERNEL);
+			ret_stack = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
 			if (!ret_stack)
 				return;
 			per_cpu(idle_ret_stack, cpu) = ret_stack;
@@ -507,15 +527,13 @@ void ftrace_graph_init_task(struct task_struct *t)
 {
 	/* Make sure we do not use the parent ret_stack */
 	t->ret_stack = NULL;
-	t->curr_ret_stack = -1;
+	t->curr_ret_stack = 0;
 	t->curr_ret_depth = -1;
 
 	if (ftrace_graph_active) {
-		struct ftrace_ret_stack *ret_stack;
+		unsigned long *ret_stack;
 
-		ret_stack = kmalloc_array(FTRACE_RETFUNC_DEPTH,
-					  sizeof(struct ftrace_ret_stack),
-					  GFP_KERNEL);
+		ret_stack = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
 		if (!ret_stack)
 			return;
 		graph_init_task(t, ret_stack);
@@ -524,7 +542,7 @@ void ftrace_graph_init_task(struct task_struct *t)
 
 void ftrace_graph_exit_task(struct task_struct *t)
 {
-	struct ftrace_ret_stack	*ret_stack = t->ret_stack;
+	unsigned long *ret_stack = t->ret_stack;
 
 	t->ret_stack = NULL;
 	/* NULL must become visible to IRQs before we free it: */
@@ -536,12 +554,10 @@ void ftrace_graph_exit_task(struct task_struct *t)
 /* Allocate a return stack for each task */
 static int start_graph_tracing(void)
 {
-	struct ftrace_ret_stack **ret_stack_list;
+	unsigned long **ret_stack_list;
 	int ret, cpu;
 
-	ret_stack_list = kmalloc_array(FTRACE_RETSTACK_ALLOC_SIZE,
-				       sizeof(struct ftrace_ret_stack *),
-				       GFP_KERNEL);
+	ret_stack_list = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
 
 	if (!ret_stack_list)
 		return -ENOMEM;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 02/14 v2] function_graph: Add an array structure that will allow multiple callbacks
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 01/14 v2] function_graph: Convert ret_stack to a series of longs Steven Rostedt
@ 2019-05-20 14:20 ` Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 03/14 v2] function_graph: Allow multiple users to attach to function graph Steven Rostedt
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Add an array structure that will eventually allow the function graph tracer
to have up to 16 simultaneous callbacks attached. It's an array of 16
fgraph_ops pointers, that is assigned when one is registered. On entry of a
function the entry of the first item in the array is called, and if it
returns zero, then the callback returns non zero if it wants the return
callback to be called on exit of the function.

The array will simplify the process of having more than one callback
attached to the same function, as its index into the array can be stored on
the shadow stack. We need to only save the index, because this will allow
the fgraph_ops to be freed before the function returns (which may happen if
the function call schedule for a long time).

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/fgraph.c | 98 ++++++++++++++++++++++++++++---------------
 1 file changed, 65 insertions(+), 33 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index db0c387756a3..c765906d846d 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -38,9 +38,27 @@
 static bool kill_ftrace_graph;
 int ftrace_graph_active;
 
+#define FGRAPH_ARRAY_SIZE	16
+
+static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
+
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+{
+	return 0;
+}
+
+static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace)
+{
+}
+
+static struct fgraph_ops fgraph_stub = {
+	.entryfunc = ftrace_graph_entry_stub,
+	.retfunc = ftrace_graph_ret_stub,
+};
+
 /**
  * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called
  *
@@ -123,7 +141,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 		goto out;
 
 	/* Only trace if the calling function expects to */
-	if (!ftrace_graph_entry(&trace))
+	if (!fgraph_array[0]->entryfunc(&trace))
 		goto out_ret;
 
 	return 0;
@@ -231,7 +249,7 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 
 	ftrace_pop_return_trace(&trace, &ret, frame_pointer);
 	trace.rettime = trace_clock_local();
-	ftrace_graph_return(&trace);
+	fgraph_array[0]->retfunc(&trace);
 	/*
 	 * The ftrace_graph_return() may still access the current
 	 * ret_stack structure, we need to make sure the update of
@@ -349,11 +367,6 @@ void ftrace_graph_sleep_time_control(bool enable)
 	fgraph_sleep_time = enable;
 }
 
-int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
-{
-	return 0;
-}
-
 /* The callbacks that hook a function */
 trace_func_graph_ret_t ftrace_graph_return =
 			(trace_func_graph_ret_t)ftrace_stub;
@@ -586,37 +599,53 @@ static int start_graph_tracing(void)
 int register_ftrace_graph(struct fgraph_ops *gops)
 {
 	int ret = 0;
+	int i;
 
 	mutex_lock(&ftrace_lock);
 
-	/* we currently allow only one tracer registered at a time */
-	if (ftrace_graph_active) {
+	if (!fgraph_array[0]) {
+		/* The array must always have real data on it */
+		for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
+			fgraph_array[i] = &fgraph_stub;
+		}
+	}
+
+	/* Look for an available spot */
+	for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
+		if (fgraph_array[i] == &fgraph_stub)
+			break;
+	}
+	if (i >= FGRAPH_ARRAY_SIZE) {
 		ret = -EBUSY;
 		goto out;
 	}
 
-	register_pm_notifier(&ftrace_suspend_notifier);
+	fgraph_array[i] = gops;
 
 	ftrace_graph_active++;
-	ret = start_graph_tracing();
-	if (ret) {
-		ftrace_graph_active--;
-		goto out;
-	}
 
-	ftrace_graph_return = gops->retfunc;
+	if (ftrace_graph_active == 1) {
+		register_pm_notifier(&ftrace_suspend_notifier);
+		ret = start_graph_tracing();
+		if (ret) {
+			ftrace_graph_active--;
+			goto out;
+		}
+
+		ftrace_graph_return = gops->retfunc;
 
-	/*
-	 * Update the indirect function to the entryfunc, and the
-	 * function that gets called to the entry_test first. Then
-	 * call the update fgraph entry function to determine if
-	 * the entryfunc should be called directly or not.
-	 */
-	__ftrace_graph_entry = gops->entryfunc;
-	ftrace_graph_entry = ftrace_graph_entry_test;
-	update_function_graph_func();
+		/*
+		 * Update the indirect function to the entryfunc, and the
+		 * function that gets called to the entry_test first. Then
+		 * call the update fgraph entry function to determine if
+		 * the entryfunc should be called directly or not.
+		 */
+		__ftrace_graph_entry = gops->entryfunc;
+		ftrace_graph_entry = ftrace_graph_entry_test;
+		update_function_graph_func();
 
-	ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
+		ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
+	}
 out:
 	mutex_unlock(&ftrace_lock);
 	return ret;
@@ -624,19 +653,22 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 
 void unregister_ftrace_graph(struct fgraph_ops *gops)
 {
+	int i;
+
 	mutex_lock(&ftrace_lock);
 
 	if (unlikely(!ftrace_graph_active))
 		goto out;
 
 	ftrace_graph_active--;
-	ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
-	ftrace_graph_entry = ftrace_graph_entry_stub;
-	__ftrace_graph_entry = ftrace_graph_entry_stub;
-	ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
-	unregister_pm_notifier(&ftrace_suspend_notifier);
-	unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
-
+	if (!ftrace_graph_active) {
+		ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
+		ftrace_graph_entry = ftrace_graph_entry_stub;
+		__ftrace_graph_entry = ftrace_graph_entry_stub;
+		ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
+		unregister_pm_notifier(&ftrace_suspend_notifier);
+		unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
+	}
  out:
 	mutex_unlock(&ftrace_lock);
 }
-- 
2.20.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 03/14 v2] function_graph: Allow multiple users to attach to function graph
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 01/14 v2] function_graph: Convert ret_stack to a series of longs Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 02/14 v2] function_graph: Add an array structure that will allow multiple callbacks Steven Rostedt
@ 2019-05-20 14:20 ` Steven Rostedt
  2019-05-24 11:26   ` Peter Zijlstra
  2019-05-20 14:20 ` [RFC][PATCH 04/14 v2] function_graph: Remove logic around ftrace_graph_entry and return Steven Rostedt
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Allow for multiple users to attach to function graph tracer at the same
time. Only 16 simultaneous users can attach to the tracer. This is because
there's an array that stores the pointers to the attached fgraph_ops. When a
a function being traced is entered, each of the ftrace_ops entryfunc is
called and if it returns non zero, its index into the array will be added to
the shadow stack.

On exit of the function being traced, the shadow stack will contain the
indexes of the ftrace_ops on the array that want their retfunc to be called.

Because a function may sleep for a long time (if a task sleeps itself), the
return of the function may be literally days later. If the ftrace_ops is
removed, its place on the array is replaced with a ftrace_ops that contains
the stub functions and that will be called when the function finally
returns.

If another ftrace_ops is added that happens to get the same index into the
array, its return function may be called. But that's actually the way things
current work with the old function graph tracer. If one tracer is removed
and another is added, the new one will get the return calls of the function
traced by the previous one, thus this is not a regression. This can be fixed
by adding a counter to each time the array item is updated and save that on
the shadow stack as well, such that it wont be called if the index saved
does not match the index on the array.

Note, being able to filter functions when both are called is not completely
handled yet, but that shouldn't be too hard to manage.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/ftrace.h |   2 +
 kernel/trace/fgraph.c  | 343 +++++++++++++++++++++++++++++++++++------
 2 files changed, 296 insertions(+), 49 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 8a8cb3c401b2..6fe69e0dc415 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -787,6 +787,8 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int idx);
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 				    unsigned long ret, unsigned long *retp);
 
+int function_graph_enter(unsigned long ret, unsigned long func,
+			 unsigned long frame_pointer, unsigned long *retp);
 /*
  * Sometimes we don't want to trace a function with the function
  * graph tracer but we want them to keep traced by the usual function
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index c765906d846d..b185d74aa5fa 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -25,23 +25,143 @@
 
 #define FGRAPH_RET_SIZE (sizeof(struct ftrace_ret_stack))
 #define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
+
+/*
+ * On entry to a function (via function_graph_enter()), a new ftrace_ret_stack
+ * is allocated on the task's ret_stack, then each fgraph_ops on the
+ * fgraph_array[]'s entryfunc is called and if that returns non-zero, the
+ * index into the fgraph_array[] for that fgraph_ops is added to the ret_stack.
+ * As the associated ftrace_ret_stack saved for those fgraph_ops needs to
+ * be found, the index to it is also added to the ret_stack along with the
+ * index of the fgraph_array[] to each fgraph_ops that needs their retfunc
+ * called.
+ *
+ * The top of the ret_stack (when not empty) will always have a reference
+ * to the last ftrace_ret_stack saved. All references to the
+ * ftrace_ret_stack has the format of:
+ *
+ * bits:  0 - 13	Index in words from the previous ftrace_ret_stack
+ * bits: 14 - 15	Type of storage
+ *			  0 - reserved
+ *			  1 - fgraph_array index
+ * For fgraph_array_index:
+ *  bits: 16 - 23	The fgraph_ops fgraph_array index
+ *
+ * That is, at the end of function_graph_enter, if the first and forth
+ * fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc called
+ * on the return of the function being traced, this is what will be on the
+ * task's shadow ret_stack: (the stack grows upward)
+ *
+ * |                                  | <- task->curr_ret_stack
+ * +----------------------------------+
+ * | (3 << FGRAPH_ARRAY_SHIFT)|(2)    | ( 3 for index of fourth fgraph_ops)
+ * +----------------------------------+
+ * | (0 << FGRAPH_ARRAY_SHIFT)|(1)    | ( 0 for index of first fgraph_ops)
+ * +----------------------------------+
+ * | struct ftrace_ret_stack          |
+ * |   (stores the saved ret pointer) |
+ * +----------------------------------+
+ * |             (X) | (N)            | ( N words away from previous ret_stack)
+ * |                                  |
+ *
+ * If a backtrace is required, and the real return pointer needs to be
+ * fetched, then it looks at the task's curr_ret_stack index, if it
+ * is greater than zero, it would subtact one, and then mask the value
+ * on the ret_stack by FGRAPH_RET_INDEX_MASK and subtract FGRAPH_RET_INDEX
+ * from that, to get the index of the ftrace_ret_stack structure stored
+ * on the shadow stack.
+ */
+
+#define FGRAPH_RET_INDEX_SIZE	14
+#define FGRAPH_RET_INDEX_MASK	((1 << FGRAPH_RET_INDEX_SIZE) - 1)
+
+
+#define FGRAPH_TYPE_SIZE	2
+#define FGRAPH_TYPE_MASK	((1 << FGRAPH_TYPE_SIZE) - 1)
+#define FGRAPH_TYPE_SHIFT	FGRAPH_RET_INDEX_SIZE
+
+enum {
+	FGRAPH_TYPE_RESERVED	= 0,
+	FGRAPH_TYPE_ARRAY	= 1,
+};
+
+#define FGRAPH_ARRAY_SIZE	16
+#define FGRAPH_ARRAY_MASK	((1 << FGRAPH_ARRAY_SIZE) - 1)
+#define FGRAPH_ARRAY_SHIFT	(FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
+
+/* Currently the max stack index can't be more than register callers */
+#define FGRAPH_MAX_INDEX	FGRAPH_ARRAY_SIZE
+
+#define FGRAPH_FRAME_SIZE (FGRAPH_RET_SIZE + FGRAPH_ARRAY_SIZE * (sizeof(long)))
+#define FGRAPH_FRAME_INDEX (ALIGN(FGRAPH_FRAME_SIZE,		\
+				  sizeof(long)) / sizeof(long))
 #define SHADOW_STACK_SIZE (PAGE_SIZE)
 #define SHADOW_STACK_INDEX			\
 	(ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
 /* Leave on a buffer at the end */
-#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
+#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - (FGRAPH_RET_INDEX + 1))
 
 #define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
-#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
-#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
 
 static bool kill_ftrace_graph;
 int ftrace_graph_active;
 
-#define FGRAPH_ARRAY_SIZE	16
+static int fgraph_array_cnt;
 
 static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
 
+static inline int get_ret_stack_index(struct task_struct *t, int offset)
+{
+	return current->ret_stack[offset] & FGRAPH_RET_INDEX_MASK;
+}
+
+static inline int get_fgraph_type(struct task_struct *t, int offset)
+{
+	return (current->ret_stack[offset] >> FGRAPH_TYPE_SHIFT) &
+		FGRAPH_TYPE_MASK;
+}
+
+static inline int get_fgraph_array(struct task_struct *t, int offset)
+{
+	return (current->ret_stack[offset] >> FGRAPH_ARRAY_SHIFT) &
+		FGRAPH_ARRAY_MASK;
+}
+
+/*
+ * @offset: The index into @t->ret_stack to find the ret_stack entry
+ * @index: Where to place the index into @t->ret_stack of that entry
+ *
+ * Calling this with:
+ *
+ *   offset = task->curr_ret_stack;
+ *   do {
+ *	ret_stack = get_ret_stack(task, offset, &offset);
+ *   } while (ret_stack);
+ *
+ * Will iterate through all the ret_stack entries from curr_ret_stack
+ * down to the first one.
+ */
+static inline struct ftrace_ret_stack *
+get_ret_stack(struct task_struct *t, int offset, int *index)
+{
+	int idx;
+
+	if (offset <= 0)
+		return NULL;
+
+	idx = get_ret_stack_index(t, offset - 1);
+
+	if (idx <= 0 || idx > FGRAPH_MAX_INDEX)
+		return NULL;
+
+	offset -= idx + FGRAPH_RET_INDEX;
+	if (offset < 0)
+		return NULL;
+
+	*index = offset;
+	return RET_STACK(t, offset);
+}
+
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 
@@ -114,9 +234,34 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
 	calltime = trace_clock_local();
 
 	index = current->curr_ret_stack;
-	RET_STACK_INC(current->curr_ret_stack);
+	/* ret offset = 1 ; type = reserved */
+	current->ret_stack[index + FGRAPH_RET_INDEX] = 1;
 	ret_stack = RET_STACK(current, index);
+	ret_stack->ret = ret;
+	/*
+	 * The undwinders expect curr_ret_stack to point to either zero
+	 * or an index where to find the next ret_stack. Even though the
+	 * ret stack might be bogus, we want to write the ret and the
+	 * index to find the ret_stack before we increment the stack point.
+	 * If an interrupt comes in now before we increment the curr_ret_stack
+	 * it may blow away what we wrote. But that's fine, because the
+	 * index will still be correct (even though the ret wont be).
+	 * What we worry about is the index being correct after we increment
+	 * the curr_ret_stack and before we update that index, as if an
+	 * interrupt comes in and does an unwind stack dump, it will need
+	 * at least a correct index!
+	 */
+	barrier();
+	current->curr_ret_stack += FGRAPH_RET_INDEX + 1;
+	/*
+	 * This next barrier is to ensure that an interrupt coming in
+	 * will not corrupt what we are about to write.
+	 */
 	barrier();
+
+	/* Still keep it reserved even if an interrupt came in */
+	current->ret_stack[index + FGRAPH_RET_INDEX] = 1;
+
 	ret_stack->ret = ret;
 	ret_stack->func = func;
 	ret_stack->calltime = calltime;
@@ -133,6 +278,12 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			 unsigned long frame_pointer, unsigned long *retp)
 {
 	struct ftrace_graph_ent trace;
+	int offset;
+	int start;
+	int type;
+	int val;
+	int cnt = 0;
+	int i;
 
 	trace.func = func;
 	trace.depth = ++current->curr_ret_depth;
@@ -140,38 +291,87 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 	if (ftrace_push_return_trace(ret, func, frame_pointer, retp))
 		goto out;
 
-	/* Only trace if the calling function expects to */
-	if (!fgraph_array[0]->entryfunc(&trace))
+	/* Use start for the distance to ret_stack (skipping over reserve) */
+	start = offset = current->curr_ret_stack - 2;
+
+	for (i = 0; i < fgraph_array_cnt; i++) {
+		struct fgraph_ops *gops = fgraph_array[i];
+
+		if (gops == &fgraph_stub)
+			continue;
+
+		if ((offset == start) &&
+		    (current->curr_ret_stack >= SHADOW_STACK_INDEX - 1)) {
+			atomic_inc(&current->trace_overrun);
+			break;
+		}
+		if (fgraph_array[i]->entryfunc(&trace)) {
+			offset = current->curr_ret_stack;
+			/* Check the top level stored word */
+			type = get_fgraph_type(current, offset - 1);
+
+			val = (i << FGRAPH_ARRAY_SHIFT) |
+				(FGRAPH_TYPE_ARRAY << FGRAPH_TYPE_SHIFT) |
+				((offset - start) - 1);
+
+			/* We can reuse the top word if it is reserved */
+			if (type == FGRAPH_TYPE_RESERVED) {
+				current->ret_stack[offset - 1] = val;
+				cnt++;
+				continue;
+			}
+			val++;
+
+			current->ret_stack[offset] = val;
+			/*
+			 * Write the value before we increment, so that
+			 * if an interrupt comes in after we increment
+			 * it will still see the value and skip over
+			 * this.
+			 */
+			barrier();
+			current->curr_ret_stack++;
+			/*
+			 * Have to write again, in case an interrupt
+			 * came in before the increment and after we
+			 * wrote the value.
+			 */
+			barrier();
+			current->ret_stack[offset] = val;
+			cnt++;
+		}
+	}
+
+	if (!cnt)
 		goto out_ret;
 
 	return 0;
  out_ret:
-	RET_STACK_DEC(current->curr_ret_stack);
+	current->curr_ret_stack -= FGRAPH_RET_INDEX + 1;
  out:
 	current->curr_ret_depth--;
 	return -EBUSY;
 }
 
 /* Retrieve a function return address to the trace stack on thread info.*/
-static void
+static struct ftrace_ret_stack *
 ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 			unsigned long frame_pointer)
 {
 	struct ftrace_ret_stack *ret_stack;
 	int index;
 
-	index = current->curr_ret_stack;
-	RET_STACK_DEC(index);
+	ret_stack = get_ret_stack(current, current->curr_ret_stack, &index);
 
-	if (unlikely(index < 0 || index > SHADOW_STACK_MAX_INDEX)) {
+	if (unlikely(!ret_stack)) {
 		ftrace_graph_stop();
-		WARN_ON(1);
+		WARN(1, "Bad function graph ret_stack pointer: %d",
+		     current->curr_ret_stack);
 		/* Might as well panic, otherwise we have no where to go */
 		*ret = (unsigned long)panic;
-		return;
+		return NULL;
 	}
 
-	ret_stack = RET_STACK(current, index);
 #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
 	/*
 	 * The arch may choose to record the frame pointer used
@@ -191,12 +391,12 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 		ftrace_graph_stop();
 		WARN(1, "Bad frame pointer: expected %lx, received %lx\n"
 		     "  from func %ps return to %lx\n",
-		     current->ret_stack[index].fp,
+		     ret_stack->fp,
 		     frame_pointer,
 		     (void *)ret_stack->func,
 		     ret_stack->ret);
 		*ret = (unsigned long)panic;
-		return;
+		return NULL;
 	}
 #endif
 
@@ -204,13 +404,15 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
 	trace->func = ret_stack->func;
 	trace->calltime = ret_stack->calltime;
 	trace->overrun = atomic_read(&current->trace_overrun);
-	trace->depth = current->curr_ret_depth--;
+	trace->depth = current->curr_ret_depth;
 	/*
 	 * We still want to trace interrupts coming in if
 	 * max_depth is set to 1. Make sure the decrement is
 	 * seen before ftrace_graph_return.
 	 */
 	barrier();
+
+	return ret_stack;
 }
 
 /*
@@ -244,27 +446,44 @@ static struct notifier_block ftrace_suspend_notifier = {
  */
 unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 {
+	struct ftrace_ret_stack *ret_stack;
 	struct ftrace_graph_ret trace;
 	unsigned long ret;
+	int offset;
+	int index;
+	int idx;
+	int i;
 
-	ftrace_pop_return_trace(&trace, &ret, frame_pointer);
-	trace.rettime = trace_clock_local();
-	fgraph_array[0]->retfunc(&trace);
-	/*
-	 * The ftrace_graph_return() may still access the current
-	 * ret_stack structure, we need to make sure the update of
-	 * curr_ret_stack is after that.
-	 */
-	barrier();
-	RET_STACK_DEC(current->curr_ret_stack);
+	ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer);
 
 	if (unlikely(!ret)) {
 		ftrace_graph_stop();
 		WARN_ON(1);
 		/* Might as well panic. What else to do? */
-		ret = (unsigned long)panic;
+		return (unsigned long)panic;
 	}
 
+	trace.rettime = trace_clock_local();
+
+	offset = current->curr_ret_stack - 1;
+	index = get_ret_stack_index(current, offset);
+
+	/* index has to be at least one! Optimize for it */
+	i = 0;
+	do {
+		idx = get_fgraph_array(current, offset - i);
+		fgraph_array[idx]->retfunc(&trace);
+		i++;
+	} while (i < index);
+
+	/*
+	 * The ftrace_graph_return() may still access the current
+	 * ret_stack structure, we need to make sure the update of
+	 * curr_ret_stack is after that.
+	 */
+	barrier();
+	current->curr_ret_stack -= index + FGRAPH_RET_INDEX;
+	current->curr_ret_depth--;
 	return ret;
 }
 
@@ -282,13 +501,17 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 struct ftrace_ret_stack *
 ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
 {
+	struct ftrace_ret_stack *ret_stack = NULL;
 	int index = task->curr_ret_stack;
 
-	index -= FGRAPH_RET_INDEX * (idx + 1);
 	if (index < 0)
 		return NULL;
 
-	return RET_STACK(task, index);
+	do {
+		ret_stack = get_ret_stack(task, index, &index);
+	} while (ret_stack && --idx >= 0);
+
+	return ret_stack;
 }
 
 /**
@@ -311,16 +534,15 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 				    unsigned long ret, unsigned long *retp)
 {
 	struct ftrace_ret_stack *ret_stack;
-	int index = task->curr_ret_stack;
-	int i;
+	int i = task->curr_ret_stack;
 
 	if (ret != (unsigned long)return_to_handler)
 		return ret;
 
-	RET_STACK_DEC(index);
-
-	for (i = index; i >= 0; RET_STACK_DEC(i)) {
-		ret_stack = RET_STACK(task, i);
+	while (i > 0) {
+		ret_stack = get_ret_stack(current, i, &i);
+		if (!ret_stack)
+			break;
 		if (ret_stack->retp == retp)
 			return ret_stack->ret;
 	}
@@ -331,21 +553,26 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 				    unsigned long ret, unsigned long *retp)
 {
-	int task_idx;
+	struct ftrace_ret_stack *ret_stack;
+	int task_idx = task->curr_ret_stack;
+	int i;
 
 	if (ret != (unsigned long)return_to_handler)
 		return ret;
 
-	task_idx = task->curr_ret_stack;
-	RET_STACK_DEC(task_idx);
-
-	if (!task->ret_stack || task_idx < *idx)
+	if (!idx)
 		return ret;
 
-	task_idx -= *idx;
-	RET_STACK_INC(*idx);
+	i = *idx;
+	do {
+		ret_stack = get_ret_stack(task, task_idx, &task_idx);
+		i--;
+	} while (i >= 0 && ret_stack);
 
-	return RET_STACK(task, task_idx);
+	if (ret_stack)
+		return ret_stack->ret;
+
+	return ret;
 }
 #endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
 
@@ -446,10 +673,10 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
 	 */
 	timestamp -= next->ftrace_timestamp;
 
-	for (index = next->curr_ret_stack - FGRAPH_RET_INDEX; index >= 0; ) {
-		ret_stack = RET_STACK(next, index);
-		ret_stack->calltime += timestamp;
-		index -= FGRAPH_RET_INDEX;
+	for (index = next->curr_ret_stack; index > 0; ) {
+		ret_stack = get_ret_stack(next, index, &index);
+		if (ret_stack)
+			ret_stack->calltime += timestamp;
 	}
 }
 
@@ -501,6 +728,8 @@ graph_init_task(struct task_struct *t, unsigned long *ret_stack)
 	atomic_set(&t->tracing_graph_pause, 0);
 	atomic_set(&t->trace_overrun, 0);
 	t->ftrace_timestamp = 0;
+	t->curr_ret_stack = 0;
+	t->curr_ret_depth = -1;
 	/* make curr_ret_stack visible before we add the ret_stack */
 	smp_wmb();
 	t->ret_stack = ret_stack;
@@ -621,6 +850,8 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 	}
 
 	fgraph_array[i] = gops;
+	if (i + 1 > fgraph_array_cnt)
+		fgraph_array_cnt = i + 1;
 
 	ftrace_graph_active++;
 
@@ -660,6 +891,20 @@ void unregister_ftrace_graph(struct fgraph_ops *gops)
 	if (unlikely(!ftrace_graph_active))
 		goto out;
 
+	for (i = 0; i < fgraph_array_cnt; i++)
+		if (gops == fgraph_array[i])
+			break;
+	if (i >= fgraph_array_cnt)
+		goto out;
+
+	fgraph_array[i] = &fgraph_stub;
+	if (i + 1 == fgraph_array_cnt) {
+		for (; i >= 0; i--)
+			if (fgraph_array[i] != &fgraph_stub)
+				break;
+		fgraph_array_cnt = i + 1;
+	}
+
 	ftrace_graph_active--;
 	if (!ftrace_graph_active) {
 		ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 04/14 v2] function_graph: Remove logic around ftrace_graph_entry and return
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (2 preceding siblings ...)
  2019-05-20 14:20 ` [RFC][PATCH 03/14 v2] function_graph: Allow multiple users to attach to function graph Steven Rostedt
@ 2019-05-20 14:20 ` Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 05/14 v2] ftrace/function_graph: Pass fgraph_ops to function graph callbacks Steven Rostedt
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

The function pointers ftrace_graph_entry and ftrace_graph_return are no
longer called via the function_graph tracer. Instead, an array structure is
now used that will allow for multiple users of the function_graph
infrastructure. The variables are still used by the architecture code for
non dynamic ftrace configs, where a test is made against them to see if they
point to the default stub function or not. This is how the static function
tracing knows to call into the function graph tracer infrastructure or not.

Two new stub functions are made. entry_run() and return_run(). The
ftrace_graph_entry and ftrace_graph_return are set to them repectively when
the function graph tracer is enabled, and this will trigger the architecture
specific function graph code to be executed.

This also requires checking the global_ops hash for all calls into the
function_graph tracer.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/fgraph.c          | 71 +++++++++-------------------------
 kernel/trace/ftrace.c          |  2 -
 kernel/trace/ftrace_internal.h |  2 -
 3 files changed, 19 insertions(+), 56 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index b185d74aa5fa..ce7212830207 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -127,6 +127,18 @@ static inline int get_fgraph_array(struct task_struct *t, int offset)
 		FGRAPH_ARRAY_MASK;
 }
 
+/* ftrace_graph_entry set to this to tell some archs to run function graph */
+static int entry_run(struct ftrace_graph_ent *trace)
+{
+	return 0;
+}
+
+/* ftrace_graph_return set to this to tell some archs to run function graph */
+static void return_run(struct ftrace_graph_ret *trace)
+{
+	return;
+}
+
 /*
  * @offset: The index into @t->ret_stack to find the ret_stack entry
  * @index: Where to place the index into @t->ret_stack of that entry
@@ -285,6 +297,9 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 	int cnt = 0;
 	int i;
 
+	if (!ftrace_ops_test(&global_ops, func, NULL))
+		goto out;
+
 	trace.func = func;
 	trace.depth = ++current->curr_ret_depth;
 
@@ -598,7 +613,6 @@ void ftrace_graph_sleep_time_control(bool enable)
 trace_func_graph_ret_t ftrace_graph_return =
 			(trace_func_graph_ret_t)ftrace_stub;
 trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
-static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
 
 /* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
 static int alloc_retstack_tasklist(unsigned long **ret_stack_list)
@@ -680,46 +694,6 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
 	}
 }
 
-static int ftrace_graph_entry_test(struct ftrace_graph_ent *trace)
-{
-	if (!ftrace_ops_test(&global_ops, trace->func, NULL))
-		return 0;
-	return __ftrace_graph_entry(trace);
-}
-
-/*
- * The function graph tracer should only trace the functions defined
- * by set_ftrace_filter and set_ftrace_notrace. If another function
- * tracer ops is registered, the graph tracer requires testing the
- * function against the global ops, and not just trace any function
- * that any ftrace_ops registered.
- */
-void update_function_graph_func(void)
-{
-	struct ftrace_ops *op;
-	bool do_test = false;
-
-	/*
-	 * The graph and global ops share the same set of functions
-	 * to test. If any other ops is on the list, then
-	 * the graph tracing needs to test if its the function
-	 * it should call.
-	 */
-	do_for_each_ftrace_op(op, ftrace_ops_list) {
-		if (op != &global_ops && op != &graph_ops &&
-		    op != &ftrace_list_end) {
-			do_test = true;
-			/* in double loop, break out with goto */
-			goto out;
-		}
-	} while_for_each_ftrace_op(op);
- out:
-	if (do_test)
-		ftrace_graph_entry = ftrace_graph_entry_test;
-	else
-		ftrace_graph_entry = __ftrace_graph_entry;
-}
-
 static DEFINE_PER_CPU(unsigned long *, idle_ret_stack);
 
 static void
@@ -862,18 +836,12 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 			ftrace_graph_active--;
 			goto out;
 		}
-
-		ftrace_graph_return = gops->retfunc;
-
 		/*
-		 * Update the indirect function to the entryfunc, and the
-		 * function that gets called to the entry_test first. Then
-		 * call the update fgraph entry function to determine if
-		 * the entryfunc should be called directly or not.
+		 * Some archs just test to see if these are not
+		 * the default function
 		 */
-		__ftrace_graph_entry = gops->entryfunc;
-		ftrace_graph_entry = ftrace_graph_entry_test;
-		update_function_graph_func();
+		ftrace_graph_return = return_run;
+		ftrace_graph_entry = entry_run;
 
 		ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
 	}
@@ -909,7 +877,6 @@ void unregister_ftrace_graph(struct fgraph_ops *gops)
 	if (!ftrace_graph_active) {
 		ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
 		ftrace_graph_entry = ftrace_graph_entry_stub;
-		__ftrace_graph_entry = ftrace_graph_entry_stub;
 		ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
 		unregister_pm_notifier(&ftrace_suspend_notifier);
 		unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 47b41502a24c..9f3282d10f47 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -212,8 +212,6 @@ static void update_ftrace_function(void)
 		func = ftrace_ops_list_func;
 	}
 
-	update_function_graph_func();
-
 	/* If there's no change, then do nothing more here */
 	if (ftrace_trace_function == func)
 		return;
diff --git a/kernel/trace/ftrace_internal.h b/kernel/trace/ftrace_internal.h
index 0515a2096f90..60f685bec837 100644
--- a/kernel/trace/ftrace_internal.h
+++ b/kernel/trace/ftrace_internal.h
@@ -63,10 +63,8 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 extern int ftrace_graph_active;
-void update_function_graph_func(void);
 #else /* !CONFIG_FUNCTION_GRAPH_TRACER */
 # define ftrace_graph_active 0
-static inline void update_function_graph_func(void) { }
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
 #else /* !CONFIG_FUNCTION_TRACER */
-- 
2.20.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 05/14 v2] ftrace/function_graph: Pass fgraph_ops to function graph callbacks
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (3 preceding siblings ...)
  2019-05-20 14:20 ` [RFC][PATCH 04/14 v2] function_graph: Remove logic around ftrace_graph_entry and return Steven Rostedt
@ 2019-05-20 14:20 ` Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 06/14 v2] ftrace: Allow function_graph tracer to be enabled in instances Steven Rostedt
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Pass the fgraph_ops structure to the function graph callbacks. This will
allow callbacks to add a descriptor to a fgraph_ops private field that wil
be added in the future and use it for the callbacks. This will be useful
when more than one callback can be registered to the function graph tracer.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/ftrace.h               | 10 +++++++---
 kernel/trace/fgraph.c                | 14 ++++++++------
 kernel/trace/ftrace.c                |  6 ++++--
 kernel/trace/trace.h                 |  4 ++--
 kernel/trace/trace_functions_graph.c | 11 +++++++----
 kernel/trace/trace_irqsoff.c         |  6 ++++--
 kernel/trace/trace_sched_wakeup.c    |  6 ++++--
 kernel/trace/trace_selftest.c        |  5 +++--
 8 files changed, 39 insertions(+), 23 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 6fe69e0dc415..906f7c25faa6 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -737,11 +737,15 @@ struct ftrace_graph_ret {
 	int depth;
 } __packed;
 
+struct fgraph_ops;
+
 /* Type of the callback handlers for tracing function graph*/
-typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *); /* return */
-typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *); /* entry */
+typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *,
+				       struct fgraph_ops *); /* return */
+typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *,
+				      struct fgraph_ops *); /* entry */
 
-extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace);
+extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct fgraph_ops *gops);
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index ce7212830207..0af9d40c4363 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -128,13 +128,13 @@ static inline int get_fgraph_array(struct task_struct *t, int offset)
 }
 
 /* ftrace_graph_entry set to this to tell some archs to run function graph */
-static int entry_run(struct ftrace_graph_ent *trace)
+static int entry_run(struct ftrace_graph_ent *trace, struct fgraph_ops *ops)
 {
 	return 0;
 }
 
 /* ftrace_graph_return set to this to tell some archs to run function graph */
-static void return_run(struct ftrace_graph_ret *trace)
+static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops)
 {
 	return;
 }
@@ -177,12 +177,14 @@ get_ret_stack(struct task_struct *t, int offset, int *index)
 /* Both enabled by default (can be cleared by function_graph tracer flags */
 static bool fgraph_sleep_time = true;
 
-int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace,
+			    struct fgraph_ops *gops)
 {
 	return 0;
 }
 
-static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace)
+static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace,
+				  struct fgraph_ops *gops)
 {
 }
 
@@ -320,7 +322,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			atomic_inc(&current->trace_overrun);
 			break;
 		}
-		if (fgraph_array[i]->entryfunc(&trace)) {
+		if (fgraph_array[i]->entryfunc(&trace, fgraph_array[i])) {
 			offset = current->curr_ret_stack;
 			/* Check the top level stored word */
 			type = get_fgraph_type(current, offset - 1);
@@ -487,7 +489,7 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 	i = 0;
 	do {
 		idx = get_fgraph_array(current, offset - i);
-		fgraph_array[idx]->retfunc(&trace);
+		fgraph_array[idx]->retfunc(&trace, fgraph_array[idx]);
 		i++;
 	} while (i < index);
 
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 9f3282d10f47..64b60217d037 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -790,7 +790,8 @@ void ftrace_graph_graph_time_control(bool enable)
 	fgraph_graph_time = enable;
 }
 
-static int profile_graph_entry(struct ftrace_graph_ent *trace)
+static int profile_graph_entry(struct ftrace_graph_ent *trace,
+			       struct fgraph_ops *gops)
 {
 	struct ftrace_ret_stack *ret_stack;
 
@@ -807,7 +808,8 @@ static int profile_graph_entry(struct ftrace_graph_ent *trace)
 	return 1;
 }
 
-static void profile_graph_return(struct ftrace_graph_ret *trace)
+static void profile_graph_return(struct ftrace_graph_ret *trace,
+				 struct fgraph_ops *gops)
 {
 	struct ftrace_ret_stack *ret_stack;
 	struct ftrace_profile_stat *stat;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 1974ce818ddb..48f152ca3558 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -737,8 +737,8 @@ void trace_default_header(struct seq_file *m);
 void print_trace_header(struct seq_file *m, struct trace_iterator *iter);
 int trace_empty(struct trace_iterator *iter);
 
-void trace_graph_return(struct ftrace_graph_ret *trace);
-int trace_graph_entry(struct ftrace_graph_ent *trace);
+void trace_graph_return(struct ftrace_graph_ret *trace, struct fgraph_ops *gops);
+int trace_graph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops);
 void set_graph_array(struct trace_array *tr);
 
 void tracing_start_cmdline_record(void);
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 69ebf3c2f1b5..2ae21788fcaf 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -124,7 +124,8 @@ static inline int ftrace_graph_ignore_irqs(void)
 	return in_irq();
 }
 
-int trace_graph_entry(struct ftrace_graph_ent *trace)
+int trace_graph_entry(struct ftrace_graph_ent *trace,
+		      struct fgraph_ops *gops)
 {
 	struct trace_array *tr = graph_array;
 	struct trace_array_cpu *data;
@@ -237,7 +238,8 @@ void __trace_graph_return(struct trace_array *tr,
 		trace_buffer_unlock_commit_nostack(buffer, event);
 }
 
-void trace_graph_return(struct ftrace_graph_ret *trace)
+void trace_graph_return(struct ftrace_graph_ret *trace,
+			struct fgraph_ops *gops)
 {
 	struct trace_array *tr = graph_array;
 	struct trace_array_cpu *data;
@@ -274,7 +276,8 @@ void set_graph_array(struct trace_array *tr)
 	smp_mb();
 }
 
-static void trace_graph_thresh_return(struct ftrace_graph_ret *trace)
+static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
+				      struct fgraph_ops *gops)
 {
 	ftrace_graph_addr_finish(trace);
 
@@ -287,7 +290,7 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace)
 	    (trace->rettime - trace->calltime < tracing_thresh))
 		return;
 	else
-		trace_graph_return(trace);
+		trace_graph_return(trace, gops);
 }
 
 static struct fgraph_ops funcgraph_thresh_ops = {
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index a745b0cee5d3..55c547f6e31d 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -172,7 +172,8 @@ static int irqsoff_display_graph(struct trace_array *tr, int set)
 	return start_irqsoff_tracer(irqsoff_trace, set);
 }
 
-static int irqsoff_graph_entry(struct ftrace_graph_ent *trace)
+static int irqsoff_graph_entry(struct ftrace_graph_ent *trace,
+			       struct fgraph_ops *gops)
 {
 	struct trace_array *tr = irqsoff_trace;
 	struct trace_array_cpu *data;
@@ -202,7 +203,8 @@ static int irqsoff_graph_entry(struct ftrace_graph_ent *trace)
 	return ret;
 }
 
-static void irqsoff_graph_return(struct ftrace_graph_ret *trace)
+static void irqsoff_graph_return(struct ftrace_graph_ret *trace,
+				 struct fgraph_ops *gops)
 {
 	struct trace_array *tr = irqsoff_trace;
 	struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index 743b2b520d34..9da1062a8181 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -112,7 +112,8 @@ static int wakeup_display_graph(struct trace_array *tr, int set)
 	return start_func_tracer(tr, set);
 }
 
-static int wakeup_graph_entry(struct ftrace_graph_ent *trace)
+static int wakeup_graph_entry(struct ftrace_graph_ent *trace,
+			      struct fgraph_ops *gops)
 {
 	struct trace_array *tr = wakeup_trace;
 	struct trace_array_cpu *data;
@@ -142,7 +143,8 @@ static int wakeup_graph_entry(struct ftrace_graph_ent *trace)
 	return ret;
 }
 
-static void wakeup_graph_return(struct ftrace_graph_ret *trace)
+static void wakeup_graph_return(struct ftrace_graph_ret *trace,
+				struct fgraph_ops *gops)
 {
 	struct trace_array *tr = wakeup_trace;
 	struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 69ee8ef12cee..8639d278b6b2 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -724,7 +724,8 @@ trace_selftest_startup_function(struct tracer *trace, struct trace_array *tr)
 static unsigned int graph_hang_thresh;
 
 /* Wrap the real function entry probe to avoid possible hanging */
-static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace)
+static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace,
+				      struct fgraph_ops *gops)
 {
 	/* This is harmlessly racy, we want to approximately detect a hang */
 	if (unlikely(++graph_hang_thresh > GRAPH_MAX_FUNC_TEST)) {
@@ -738,7 +739,7 @@ static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace)
 		return 0;
 	}
 
-	return trace_graph_entry(trace);
+	return trace_graph_entry(trace, gops);
 }
 
 static struct fgraph_ops fgraph_ops __initdata  = {
-- 
2.20.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 06/14 v2] ftrace: Allow function_graph tracer to be enabled in instances
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (4 preceding siblings ...)
  2019-05-20 14:20 ` [RFC][PATCH 05/14 v2] ftrace/function_graph: Pass fgraph_ops to function graph callbacks Steven Rostedt
@ 2019-05-20 14:20 ` Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 07/14 v2] ftrace: Allow ftrace startup flags exist without dynamic ftrace Steven Rostedt
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Now that function graph tracing can handle more than one user, allow it to
be enabled in the ftrace instances. Note, the filtering of the functions is
still joined by the top level set_ftrace_filter and friends, as well as the
graph and nograph files.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/ftrace.h               |  1 +
 kernel/trace/ftrace.c                |  1 +
 kernel/trace/trace.h                 | 12 +++++
 kernel/trace/trace_functions.c       |  7 +++
 kernel/trace/trace_functions_graph.c | 65 +++++++++++++++++-----------
 kernel/trace/trace_selftest.c        |  2 +-
 6 files changed, 62 insertions(+), 26 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 906f7c25faa6..766c565ba243 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -752,6 +752,7 @@ extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct fgraph
 struct fgraph_ops {
 	trace_func_graph_ent_t		entryfunc;
 	trace_func_graph_ret_t		retfunc;
+	void				*private;
 };
 
 /*
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 64b60217d037..d672df0229da 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6224,6 +6224,7 @@ __init void ftrace_init_global_array_ops(struct trace_array *tr)
 	tr->ops = &global_ops;
 	tr->ops->private = tr;
 	ftrace_init_trace_array(tr);
+	init_array_fgraph_ops(tr);
 }
 
 void ftrace_init_array_ops(struct trace_array *tr, ftrace_func_t func)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 48f152ca3558..e4405809d0c5 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -313,6 +313,9 @@ struct trace_array {
 #ifdef CONFIG_FUNCTION_TRACER
 	struct ftrace_ops	*ops;
 	struct trace_pid_list	__rcu *function_pids;
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+	struct fgraph_ops	*gops;
+#endif
 #ifdef CONFIG_DYNAMIC_FTRACE
 	/* All of these are protected by the ftrace_lock */
 	struct list_head	func_probes;
@@ -930,6 +933,9 @@ extern int __trace_graph_entry(struct trace_array *tr,
 extern void __trace_graph_return(struct trace_array *tr,
 				 struct ftrace_graph_ret *trace,
 				 unsigned long flags, int pc);
+extern void init_array_fgraph_ops(struct trace_array *tr);
+extern int allocate_fgraph_ops(struct trace_array *tr);
+extern void free_fgraph_ops(struct trace_array *tr);
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash *ftrace_graph_hash;
@@ -1023,6 +1029,12 @@ print_graph_function_flags(struct trace_iterator *iter, u32 flags)
 {
 	return TRACE_TYPE_UNHANDLED;
 }
+static inline void init_array_fgraph_ops(struct trace_array *tr) { }
+static inline int allocate_fgraph_ops(struct trace_array *tr)
+{
+	return 0;
+}
+static inline void free_fgraph_ops(struct trace_array *tr) { }
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
 extern struct list_head ftrace_pids;
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index b611cd36e22d..9b45ede6ea89 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -68,6 +68,12 @@ int ftrace_create_function_files(struct trace_array *tr,
 	if (ret)
 		return ret;
 
+	ret = allocate_fgraph_ops(tr);
+	if (ret) {
+		kfree(tr->ops);
+		return ret;
+	}
+
 	ftrace_create_filter_files(tr->ops, parent);
 
 	return 0;
@@ -78,6 +84,7 @@ void ftrace_destroy_function_files(struct trace_array *tr)
 	ftrace_destroy_filter_files(tr->ops);
 	kfree(tr->ops);
 	tr->ops = NULL;
+	free_fgraph_ops(tr);
 }
 
 static int function_trace_init(struct trace_array *tr)
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 2ae21788fcaf..064811ba846c 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -77,8 +77,6 @@ static struct tracer_flags tracer_flags = {
 	.opts = trace_opts
 };
 
-static struct trace_array *graph_array;
-
 /*
  * DURATION column is being also used to display IRQ signs,
  * following values are used by print_graph_irq and others
@@ -127,7 +125,7 @@ static inline int ftrace_graph_ignore_irqs(void)
 int trace_graph_entry(struct ftrace_graph_ent *trace,
 		      struct fgraph_ops *gops)
 {
-	struct trace_array *tr = graph_array;
+	struct trace_array *tr = gops->private;
 	struct trace_array_cpu *data;
 	unsigned long flags;
 	long disabled;
@@ -241,7 +239,7 @@ void __trace_graph_return(struct trace_array *tr,
 void trace_graph_return(struct ftrace_graph_ret *trace,
 			struct fgraph_ops *gops)
 {
-	struct trace_array *tr = graph_array;
+	struct trace_array *tr = gops->private;
 	struct trace_array_cpu *data;
 	unsigned long flags;
 	long disabled;
@@ -267,15 +265,6 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
 	local_irq_restore(flags);
 }
 
-void set_graph_array(struct trace_array *tr)
-{
-	graph_array = tr;
-
-	/* Make graph_array visible before we start tracing */
-
-	smp_mb();
-}
-
 static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
 				      struct fgraph_ops *gops)
 {
@@ -293,25 +282,53 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
 		trace_graph_return(trace, gops);
 }
 
-static struct fgraph_ops funcgraph_thresh_ops = {
-	.entryfunc = &trace_graph_entry,
-	.retfunc = &trace_graph_thresh_return,
-};
-
 static struct fgraph_ops funcgraph_ops = {
 	.entryfunc = &trace_graph_entry,
 	.retfunc = &trace_graph_return,
 };
 
+int allocate_fgraph_ops(struct trace_array *tr)
+{
+	struct fgraph_ops *gops;
+
+	gops = kzalloc(sizeof(*gops), GFP_KERNEL);
+	if (!gops)
+		return -ENOMEM;
+
+	gops->entryfunc = &trace_graph_entry;
+	gops->retfunc = &trace_graph_return;
+
+	tr->gops = gops;
+	gops->private = tr;
+	return 0;
+}
+
+void free_fgraph_ops(struct trace_array *tr)
+{
+	kfree(tr->gops);
+}
+
+__init void init_array_fgraph_ops(struct trace_array *tr)
+{
+	tr->gops = &funcgraph_ops;
+	funcgraph_ops.private = tr;
+}
+
 static int graph_trace_init(struct trace_array *tr)
 {
 	int ret;
 
-	set_graph_array(tr);
+	tr->gops->entryfunc = trace_graph_entry;
+
 	if (tracing_thresh)
-		ret = register_ftrace_graph(&funcgraph_thresh_ops);
+		tr->gops->retfunc = trace_graph_thresh_return;
 	else
-		ret = register_ftrace_graph(&funcgraph_ops);
+		tr->gops->retfunc = trace_graph_return;
+
+	/* Make gops functions are visible before we start tracing */
+	smp_mb();
+
+	ret = register_ftrace_graph(tr->gops);
 	if (ret)
 		return ret;
 	tracing_start_cmdline_record();
@@ -322,10 +339,7 @@ static int graph_trace_init(struct trace_array *tr)
 static void graph_trace_reset(struct trace_array *tr)
 {
 	tracing_stop_cmdline_record();
-	if (tracing_thresh)
-		unregister_ftrace_graph(&funcgraph_thresh_ops);
-	else
-		unregister_ftrace_graph(&funcgraph_ops);
+	unregister_ftrace_graph(tr->gops);
 }
 
 static int graph_trace_update_thresh(struct trace_array *tr)
@@ -1297,6 +1311,7 @@ static struct tracer graph_trace __tracer_data = {
 	.print_header	= print_graph_headers,
 	.flags		= &tracer_flags,
 	.set_flag	= func_graph_set_flag,
+	.allow_instances = true,
 #ifdef CONFIG_FTRACE_SELFTEST
 	.selftest	= trace_selftest_startup_function_graph,
 #endif
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 8639d278b6b2..facd5d1c05e7 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -770,7 +770,7 @@ trace_selftest_startup_function_graph(struct tracer *trace,
 	 * to detect and recover from possible hangs
 	 */
 	tracing_reset_online_cpus(&tr->trace_buffer);
-	set_graph_array(tr);
+	fgraph_ops.private = tr;
 	ret = register_ftrace_graph(&fgraph_ops);
 	if (ret) {
 		warn_failed_init_tracer(trace, ret);
-- 
2.20.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 07/14 v2] ftrace: Allow ftrace startup flags exist without dynamic ftrace
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (5 preceding siblings ...)
  2019-05-20 14:20 ` [RFC][PATCH 06/14 v2] ftrace: Allow function_graph tracer to be enabled in instances Steven Rostedt
@ 2019-05-20 14:20 ` Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 08/14 v2] function_graph: Have the instances use their own ftrace_ops for filtering Steven Rostedt
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Some of the flags for ftrace_startup() may be exposed even when
CONFIG_DYNAMIC_FTRACE is not configured in. This is fine as the difference
between dynamic ftrace and static ftrace is done within the internals of
ftrace itself. No need to have use cases fail to compile because dynamic
ftrace is disabled.

This change is needed to move some of the logic of what is passed to
ftrace_startup() out of the parameters of ftrace_startup().

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/ftrace.h | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 766c565ba243..d0307c9b866e 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -286,6 +286,15 @@ static inline void stack_tracer_disable(void) { }
 static inline void stack_tracer_enable(void) { }
 #endif
 
+enum {
+	FTRACE_UPDATE_CALLS		= (1 << 0),
+	FTRACE_DISABLE_CALLS		= (1 << 1),
+	FTRACE_UPDATE_TRACE_FUNC	= (1 << 2),
+	FTRACE_START_FUNC_RET		= (1 << 3),
+	FTRACE_STOP_FUNC_RET		= (1 << 4),
+	FTRACE_MAY_SLEEP		= (1 << 5),
+};
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 int ftrace_arch_code_modify_prepare(void);
@@ -373,15 +382,6 @@ void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
 void ftrace_free_filter(struct ftrace_ops *ops);
 void ftrace_ops_set_global_filter(struct ftrace_ops *ops);
 
-enum {
-	FTRACE_UPDATE_CALLS		= (1 << 0),
-	FTRACE_DISABLE_CALLS		= (1 << 1),
-	FTRACE_UPDATE_TRACE_FUNC	= (1 << 2),
-	FTRACE_START_FUNC_RET		= (1 << 3),
-	FTRACE_STOP_FUNC_RET		= (1 << 4),
-	FTRACE_MAY_SLEEP		= (1 << 5),
-};
-
 /*
  * The FTRACE_UPDATE_* enum is used to pass information back
  * from the ftrace_update_record() and ftrace_test_record()
-- 
2.20.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 08/14 v2] function_graph: Have the instances use their own ftrace_ops for filtering
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (6 preceding siblings ...)
  2019-05-20 14:20 ` [RFC][PATCH 07/14 v2] ftrace: Allow ftrace startup flags exist without dynamic ftrace Steven Rostedt
@ 2019-05-20 14:20 ` Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 09/14 v2] function_graph: Add "task variables" per task for fgraph_ops Steven Rostedt
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Allow for instances to have their own ftrace_ops part of the fgraph_ops that
makes the funtion_graph tracer filter on the set_ftrace_filter file of the
instance and not the top instance.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/ftrace.h               |  1 +
 kernel/trace/fgraph.c                | 63 +++++++++++++++++-----------
 kernel/trace/ftrace.c                |  6 +--
 kernel/trace/trace.h                 | 16 +++----
 kernel/trace/trace_functions.c       |  2 +-
 kernel/trace/trace_functions_graph.c |  8 +++-
 6 files changed, 59 insertions(+), 37 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index d0307c9b866e..e6a596e7cdf4 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -752,6 +752,7 @@ extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct fgraph
 struct fgraph_ops {
 	trace_func_graph_ent_t		entryfunc;
 	trace_func_graph_ret_t		retfunc;
+	struct ftrace_ops		ops; /* for the hash lists */
 	void				*private;
 };
 
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 0af9d40c4363..06511f5192b6 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -15,14 +15,6 @@
 
 #include "ftrace_internal.h"
 
-#ifdef CONFIG_DYNAMIC_FTRACE
-#define ASSIGN_OPS_HASH(opsname, val) \
-	.func_hash		= val, \
-	.local_hash.regex_lock	= __MUTEX_INITIALIZER(opsname.local_hash.regex_lock),
-#else
-#define ASSIGN_OPS_HASH(opsname, val)
-#endif
-
 #define FGRAPH_RET_SIZE (sizeof(struct ftrace_ret_stack))
 #define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
 
@@ -299,9 +291,6 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 	int cnt = 0;
 	int i;
 
-	if (!ftrace_ops_test(&global_ops, func, NULL))
-		goto out;
-
 	trace.func = func;
 	trace.depth = ++current->curr_ret_depth;
 
@@ -322,7 +311,8 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			atomic_inc(&current->trace_overrun);
 			break;
 		}
-		if (fgraph_array[i]->entryfunc(&trace, fgraph_array[i])) {
+		if (ftrace_ops_test(&gops->ops, func, NULL) &&
+		    gops->entryfunc(&trace, gops)) {
 			offset = current->curr_ret_stack;
 			/* Check the top level stored word */
 			type = get_fgraph_type(current, offset - 1);
@@ -593,18 +583,27 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 }
 #endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */
 
-static struct ftrace_ops graph_ops = {
-	.func			= ftrace_stub,
-	.flags			= FTRACE_OPS_FL_RECURSION_SAFE |
-				   FTRACE_OPS_FL_INITIALIZED |
-				   FTRACE_OPS_FL_PID |
-				   FTRACE_OPS_FL_STUB,
+void fgraph_init_ops(struct ftrace_ops *dst_ops,
+		     struct ftrace_ops *src_ops)
+{
+	dst_ops->func = ftrace_stub;
+	dst_ops->flags = FTRACE_OPS_FL_RECURSION_SAFE |
+		FTRACE_OPS_FL_PID |
+		FTRACE_OPS_FL_STUB;
+
 #ifdef FTRACE_GRAPH_TRAMP_ADDR
-	.trampoline		= FTRACE_GRAPH_TRAMP_ADDR,
+	dst_ops->trampoline = FTRACE_GRAPH_TRAMP_ADDR;
 	/* trampoline_size is only needed for dynamically allocated tramps */
 #endif
-	ASSIGN_OPS_HASH(graph_ops, &global_ops.local_hash)
-};
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+	if (src_ops) {
+		dst_ops->func_hash = &src_ops->local_hash;
+		mutex_init(&dst_ops->local_hash.regex_lock);
+		dst_ops->flags |= FTRACE_OPS_FL_INITIALIZED;
+	}
+#endif
+}
 
 void ftrace_graph_sleep_time_control(bool enable)
 {
@@ -803,11 +802,20 @@ static int start_graph_tracing(void)
 
 int register_ftrace_graph(struct fgraph_ops *gops)
 {
+	int command = 0;
 	int ret = 0;
 	int i;
 
 	mutex_lock(&ftrace_lock);
 
+	if (!gops->ops.func) {
+		gops->ops.flags |= FTRACE_OPS_FL_STUB;
+		gops->ops.func = ftrace_stub;
+#ifdef FTRACE_GRAPH_TRAMP_ADDR
+		gops->ops.trampoline = FTRACE_GRAPH_TRAMP_ADDR;
+#endif
+	}
+
 	if (!fgraph_array[0]) {
 		/* The array must always have real data on it */
 		for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
@@ -844,9 +852,10 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 		 */
 		ftrace_graph_return = return_run;
 		ftrace_graph_entry = entry_run;
-
-		ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
+		command = FTRACE_START_FUNC_RET;
 	}
+
+	ret = ftrace_startup(&gops->ops, command);
 out:
 	mutex_unlock(&ftrace_lock);
 	return ret;
@@ -854,6 +863,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 
 void unregister_ftrace_graph(struct fgraph_ops *gops)
 {
+	int command = 0;
 	int i;
 
 	mutex_lock(&ftrace_lock);
@@ -876,10 +886,15 @@ void unregister_ftrace_graph(struct fgraph_ops *gops)
 	}
 
 	ftrace_graph_active--;
+
+	if (!ftrace_graph_active)
+		command = FTRACE_STOP_FUNC_RET;
+
+	ftrace_shutdown(&gops->ops, command);
+
 	if (!ftrace_graph_active) {
 		ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
 		ftrace_graph_entry = ftrace_graph_entry_stub;
-		ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
 		unregister_pm_notifier(&ftrace_suspend_notifier);
 		unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
 	}
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index d672df0229da..d48a1e39f6cb 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2673,6 +2673,8 @@ int ftrace_startup(struct ftrace_ops *ops, int command)
 	if (unlikely(ftrace_disabled))
 		return -ENODEV;
 
+	ftrace_ops_init(ops);
+
 	ret = __register_ftrace_function(ops);
 	if (ret)
 		return ret;
@@ -6224,7 +6226,7 @@ __init void ftrace_init_global_array_ops(struct trace_array *tr)
 	tr->ops = &global_ops;
 	tr->ops->private = tr;
 	ftrace_init_trace_array(tr);
-	init_array_fgraph_ops(tr);
+	init_array_fgraph_ops(tr, tr->ops);
 }
 
 void ftrace_init_array_ops(struct trace_array *tr, ftrace_func_t func)
@@ -6677,8 +6679,6 @@ int register_ftrace_function(struct ftrace_ops *ops)
 {
 	int ret = -1;
 
-	ftrace_ops_init(ops);
-
 	mutex_lock(&ftrace_lock);
 
 	ret = ftrace_startup(ops, 0);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index e4405809d0c5..73eb570eb24c 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -933,8 +933,8 @@ extern int __trace_graph_entry(struct trace_array *tr,
 extern void __trace_graph_return(struct trace_array *tr,
 				 struct ftrace_graph_ret *trace,
 				 unsigned long flags, int pc);
-extern void init_array_fgraph_ops(struct trace_array *tr);
-extern int allocate_fgraph_ops(struct trace_array *tr);
+extern void init_array_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops);
+extern int allocate_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops);
 extern void free_fgraph_ops(struct trace_array *tr);
 
 #ifdef CONFIG_DYNAMIC_FTRACE
@@ -998,6 +998,7 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)
 	preempt_enable_notrace();
 	return ret;
 }
+
 #else
 static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
 {
@@ -1023,18 +1024,19 @@ static inline bool ftrace_graph_ignore_func(struct ftrace_graph_ent *trace)
 		(fgraph_max_depth && trace->depth >= fgraph_max_depth);
 }
 
+void fgraph_init_ops(struct ftrace_ops *dst_ops,
+		     struct ftrace_ops *src_ops);
+
 #else /* CONFIG_FUNCTION_GRAPH_TRACER */
 static inline enum print_line_t
 print_graph_function_flags(struct trace_iterator *iter, u32 flags)
 {
 	return TRACE_TYPE_UNHANDLED;
 }
-static inline void init_array_fgraph_ops(struct trace_array *tr) { }
-static inline int allocate_fgraph_ops(struct trace_array *tr)
-{
-	return 0;
-}
 static inline void free_fgraph_ops(struct trace_array *tr) { }
+/* ftrace_ops may not be defined */
+#define init_array_fgraph_ops(tr, ops) do { } while (0)
+#define allocate_fgraph_ops(tr, ops) ({ 0; })
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
 
 extern struct list_head ftrace_pids;
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 9b45ede6ea89..cfe1dc27a677 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -68,7 +68,7 @@ int ftrace_create_function_files(struct trace_array *tr,
 	if (ret)
 		return ret;
 
-	ret = allocate_fgraph_ops(tr);
+	ret = allocate_fgraph_ops(tr, tr->ops);
 	if (ret) {
 		kfree(tr->ops);
 		return ret;
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 064811ba846c..0434e6052650 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -287,7 +287,7 @@ static struct fgraph_ops funcgraph_ops = {
 	.retfunc = &trace_graph_return,
 };
 
-int allocate_fgraph_ops(struct trace_array *tr)
+int allocate_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops)
 {
 	struct fgraph_ops *gops;
 
@@ -300,6 +300,9 @@ int allocate_fgraph_ops(struct trace_array *tr)
 
 	tr->gops = gops;
 	gops->private = tr;
+
+	fgraph_init_ops(&gops->ops, ops);
+
 	return 0;
 }
 
@@ -308,10 +311,11 @@ void free_fgraph_ops(struct trace_array *tr)
 	kfree(tr->gops);
 }
 
-__init void init_array_fgraph_ops(struct trace_array *tr)
+__init void init_array_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops)
 {
 	tr->gops = &funcgraph_ops;
 	funcgraph_ops.private = tr;
+	fgraph_init_ops(&tr->gops->ops, ops);
 }
 
 static int graph_trace_init(struct trace_array *tr)
-- 
2.20.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 09/14 v2] function_graph: Add "task variables" per task for fgraph_ops
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (7 preceding siblings ...)
  2019-05-20 14:20 ` [RFC][PATCH 08/14 v2] function_graph: Have the instances use their own ftrace_ops for filtering Steven Rostedt
@ 2019-05-20 14:20 ` Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 10/14 v2] function_graph: Move set_graph_function tests to shadow stack global var Steven Rostedt
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Add a "task variables" array on the tasks shadow ret_stack that is the
size of longs for each possible registered fgraph_ops. That's a total of 16,
taking up 8 * 16 = 128 bytes (out of a page size 4k).

This will allow for fgraph_ops to do specific features on a per task basis
having a way to maintain state for each task.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/ftrace.h |  2 ++
 kernel/trace/fgraph.c  | 73 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index e6a596e7cdf4..a0bdd1745e56 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -754,6 +754,7 @@ struct fgraph_ops {
 	trace_func_graph_ret_t		retfunc;
 	struct ftrace_ops		ops; /* for the hash lists */
 	void				*private;
+	int				idx;
 };
 
 /*
@@ -792,6 +793,7 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int idx);
 
 unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
 				    unsigned long ret, unsigned long *retp);
+unsigned long *fgraph_get_task_var(struct fgraph_ops *gops);
 
 int function_graph_enter(unsigned long ret, unsigned long func,
 			 unsigned long frame_pointer, unsigned long *retp);
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 06511f5192b6..c225b04bcd00 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -91,10 +91,18 @@ enum {
 #define SHADOW_STACK_INDEX			\
 	(ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
 /* Leave on a buffer at the end */
-#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - (FGRAPH_RET_INDEX + 1))
+#define SHADOW_STACK_MAX_INDEX				\
+	(SHADOW_STACK_INDEX - (FGRAPH_RET_INDEX + 1 + FGRAPH_ARRAY_SIZE))
 
 #define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
 
+/*
+ * Each fgraph_ops has a reservered unsigned long at the end (top) of the
+ * ret_stack to store task specific state.
+ */
+#define SHADOW_STACK_TASK_VARS(ret_stack) \
+	((unsigned long *)(&(ret_stack)[SHADOW_STACK_INDEX - FGRAPH_ARRAY_SIZE]))
+
 static bool kill_ftrace_graph;
 int ftrace_graph_active;
 
@@ -131,6 +139,44 @@ static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops)
 	return;
 }
 
+static void ret_stack_set_task_var(struct task_struct *t, int idx, long val)
+{
+	unsigned long *gvals = SHADOW_STACK_TASK_VARS(t->ret_stack);
+
+	gvals[idx] = val;
+}
+
+static unsigned long *
+ret_stack_get_task_var(struct task_struct *t, int idx)
+{
+	unsigned long *gvals = SHADOW_STACK_TASK_VARS(t->ret_stack);
+
+	return &gvals[idx];
+}
+
+static void ret_stack_init_task_vars(unsigned long *ret_stack)
+{
+	unsigned long *gvals = SHADOW_STACK_TASK_VARS(ret_stack);
+
+	memset(gvals, 0, sizeof(*gvals) * FGRAPH_ARRAY_SIZE);
+}
+
+/**
+ * fgraph_get_task_var - retrieve a task specific state variable
+ * @gops: The ftrace_ops that owns the task specific variable
+ *
+ * Every registered fgraph_ops has a task state variable
+ * reserved on the task's ret_stack. This function returns the
+ * address to that variable.
+ *
+ * Returns the address to the fgraph_ops @gops tasks specific
+ * unsigned long variable.
+ */
+unsigned long *fgraph_get_task_var(struct fgraph_ops *gops)
+{
+	return ret_stack_get_task_var(current, gops->idx);
+}
+
 /*
  * @offset: The index into @t->ret_stack to find the ret_stack entry
  * @index: Where to place the index into @t->ret_stack of that entry
@@ -643,6 +689,7 @@ static int alloc_retstack_tasklist(unsigned long **ret_stack_list)
 		if (t->ret_stack == NULL) {
 			atomic_set(&t->tracing_graph_pause, 0);
 			atomic_set(&t->trace_overrun, 0);
+			ret_stack_init_task_vars(ret_stack_list[start]);
 			t->curr_ret_stack = 0;
 			t->curr_ret_depth = -1;
 			/* Make sure the tasks see the 0 first: */
@@ -702,6 +749,7 @@ graph_init_task(struct task_struct *t, unsigned long *ret_stack)
 {
 	atomic_set(&t->tracing_graph_pause, 0);
 	atomic_set(&t->trace_overrun, 0);
+	ret_stack_init_task_vars(ret_stack);
 	t->ftrace_timestamp = 0;
 	t->curr_ret_stack = 0;
 	t->curr_ret_depth = -1;
@@ -800,6 +848,24 @@ static int start_graph_tracing(void)
 	return ret;
 }
 
+static void init_task_vars(int idx)
+{
+	struct task_struct *g, *t;
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		if (idle_task(cpu)->ret_stack)
+			ret_stack_set_task_var(idle_task(cpu), idx, 0);
+	}
+
+	read_lock(&tasklist_lock);
+	do_each_thread(g, t) {
+		if (t->ret_stack)
+			ret_stack_set_task_var(t, idx, 0);
+	} while_each_thread(g, t);
+	read_unlock(&tasklist_lock);
+}
+
 int register_ftrace_graph(struct fgraph_ops *gops)
 {
 	int command = 0;
@@ -836,6 +902,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 	fgraph_array[i] = gops;
 	if (i + 1 > fgraph_array_cnt)
 		fgraph_array_cnt = i + 1;
+	gops->idx = i;
 
 	ftrace_graph_active++;
 
@@ -853,6 +920,8 @@ int register_ftrace_graph(struct fgraph_ops *gops)
 		ftrace_graph_return = return_run;
 		ftrace_graph_entry = entry_run;
 		command = FTRACE_START_FUNC_RET;
+	} else {
+		init_task_vars(gops->idx);
 	}
 
 	ret = ftrace_startup(&gops->ops, command);
@@ -877,6 +946,8 @@ void unregister_ftrace_graph(struct fgraph_ops *gops)
 	if (i >= fgraph_array_cnt)
 		goto out;
 
+	WARN_ON_ONCE(gops->idx != i);
+
 	fgraph_array[i] = &fgraph_stub;
 	if (i + 1 == fgraph_array_cnt) {
 		for (; i >= 0; i--)
-- 
2.20.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 10/14 v2] function_graph: Move set_graph_function tests to shadow stack global var
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (8 preceding siblings ...)
  2019-05-20 14:20 ` [RFC][PATCH 09/14 v2] function_graph: Add "task variables" per task for fgraph_ops Steven Rostedt
@ 2019-05-20 14:20 ` Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 11/14 v2] function_graph: Move graph depth stored data " Steven Rostedt
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

The use of the task->trace_recursion for the logic used for the
set_graph_funnction was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use that
instead.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/trace.h                 | 37 +++++++++++++++++-----------
 kernel/trace/trace_functions_graph.c |  6 ++---
 kernel/trace/trace_irqsoff.c         |  4 +--
 kernel/trace/trace_sched_wakeup.c    |  4 +--
 4 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 73eb570eb24c..08e79334c8ca 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -567,9 +567,6 @@ enum {
  */
 	TRACE_IRQ_BIT,
 
-	/* Set if the function is in the set_graph_function file */
-	TRACE_GRAPH_BIT,
-
 	/*
 	 * In the very unlikely case that an interrupt came in
 	 * at a start of graph tracing, and we want to trace
@@ -583,7 +580,7 @@ enum {
 	 * that preempted a softirq start of a function that
 	 * preempted normal context!!!! Luckily, it can't be
 	 * greater than 3, so the next two bits are a mask
-	 * of what the depth is when we set TRACE_GRAPH_BIT
+	 * of what the depth is when we set TRACE_GRAPH_FL
 	 */
 
 	TRACE_GRAPH_DEPTH_START_BIT,
@@ -937,11 +934,16 @@ extern void init_array_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops
 extern int allocate_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops);
 extern void free_fgraph_ops(struct trace_array *tr);
 
+enum {
+	TRACE_GRAPH_FL		= 1,
+};
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash *ftrace_graph_hash;
 extern struct ftrace_hash *ftrace_graph_notrace_hash;
 
-static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
+static inline int
+ftrace_graph_addr(unsigned long *task_var, struct ftrace_graph_ent *trace)
 {
 	unsigned long addr = trace->func;
 	int ret = 0;
@@ -954,12 +956,11 @@ static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
 	}
 
 	if (ftrace_lookup_ip(ftrace_graph_hash, addr)) {
-
 		/*
 		 * This needs to be cleared on the return functions
 		 * when the depth is zero.
 		 */
-		trace_recursion_set(TRACE_GRAPH_BIT);
+		*task_var |= TRACE_GRAPH_FL;
 		trace_recursion_set_depth(trace->depth);
 
 		/*
@@ -979,11 +980,14 @@ static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
 	return ret;
 }
 
-static inline void ftrace_graph_addr_finish(struct ftrace_graph_ret *trace)
+static inline void
+ftrace_graph_addr_finish(struct fgraph_ops *gops, struct ftrace_graph_ret *trace)
 {
-	if (trace_recursion_test(TRACE_GRAPH_BIT) &&
+	unsigned long *task_var = fgraph_get_task_var(gops);
+
+	if ((*task_var & TRACE_GRAPH_FL) &&
 	    trace->depth == trace_recursion_depth())
-		trace_recursion_clear(TRACE_GRAPH_BIT);
+		*task_var &= ~TRACE_GRAPH_FL;
 }
 
 static inline int ftrace_graph_notrace_addr(unsigned long addr)
@@ -1000,7 +1004,7 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)
 }
 
 #else
-static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
+static inline int ftrace_graph_addr(unsigned long *task_var, struct ftrace_graph_ent *trace)
 {
 	return 1;
 }
@@ -1009,17 +1013,20 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)
 {
 	return 0;
 }
-static inline void ftrace_graph_addr_finish(struct ftrace_graph_ret *trace)
+static inline void ftrace_graph_addr_finish(struct fgraph_ops *gops, struct ftrace_graph_ret *trace)
 { }
 #endif /* CONFIG_DYNAMIC_FTRACE */
 
 extern unsigned int fgraph_max_depth;
 
-static inline bool ftrace_graph_ignore_func(struct ftrace_graph_ent *trace)
+static inline bool
+ftrace_graph_ignore_func(struct fgraph_ops *gops, struct ftrace_graph_ent *trace)
 {
+	unsigned long *task_var = fgraph_get_task_var(gops);
+
 	/* trace it when it is-nested-in or is a function enabled. */
-	return !(trace_recursion_test(TRACE_GRAPH_BIT) ||
-		 ftrace_graph_addr(trace)) ||
+	return !((*task_var & TRACE_GRAPH_FL) ||
+		 ftrace_graph_addr(task_var, trace)) ||
 		(trace->depth < 0) ||
 		(fgraph_max_depth && trace->depth >= fgraph_max_depth);
 }
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 0434e6052650..054ec91e5086 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -148,7 +148,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace,
 	if (!ftrace_trace_task(tr))
 		return 0;
 
-	if (ftrace_graph_ignore_func(trace))
+	if (ftrace_graph_ignore_func(gops, trace))
 		return 0;
 
 	if (ftrace_graph_ignore_irqs())
@@ -246,7 +246,7 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
 	int cpu;
 	int pc;
 
-	ftrace_graph_addr_finish(trace);
+	ftrace_graph_addr_finish(gops, trace);
 
 	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
 		trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
@@ -268,7 +268,7 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
 static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
 				      struct fgraph_ops *gops)
 {
-	ftrace_graph_addr_finish(trace);
+	ftrace_graph_addr_finish(gops, trace);
 
 	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
 		trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 55c547f6e31d..7e31f0a2ef58 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -181,7 +181,7 @@ static int irqsoff_graph_entry(struct ftrace_graph_ent *trace,
 	int ret;
 	int pc;
 
-	if (ftrace_graph_ignore_func(trace))
+	if (ftrace_graph_ignore_func(gops, trace))
 		return 0;
 	/*
 	 * Do not trace a function if it's filtered by set_graph_notrace.
@@ -211,7 +211,7 @@ static void irqsoff_graph_return(struct ftrace_graph_ret *trace,
 	unsigned long flags;
 	int pc;
 
-	ftrace_graph_addr_finish(trace);
+	ftrace_graph_addr_finish(gops, trace);
 
 	if (!func_prolog_dec(tr, &data, &flags))
 		return;
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index 9da1062a8181..a04e59f6f13f 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -120,7 +120,7 @@ static int wakeup_graph_entry(struct ftrace_graph_ent *trace,
 	unsigned long flags;
 	int pc, ret = 0;
 
-	if (ftrace_graph_ignore_func(trace))
+	if (ftrace_graph_ignore_func(gops, trace))
 		return 0;
 	/*
 	 * Do not trace a function if it's filtered by set_graph_notrace.
@@ -151,7 +151,7 @@ static void wakeup_graph_return(struct ftrace_graph_ret *trace,
 	unsigned long flags;
 	int pc;
 
-	ftrace_graph_addr_finish(trace);
+	ftrace_graph_addr_finish(gops, trace);
 
 	if (!func_prolog_preempt_disable(tr, &data, &pc))
 		return;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 11/14 v2] function_graph: Move graph depth stored data to shadow stack global var
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (9 preceding siblings ...)
  2019-05-20 14:20 ` [RFC][PATCH 10/14 v2] function_graph: Move set_graph_function tests to shadow stack global var Steven Rostedt
@ 2019-05-20 14:20 ` Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 12/14 v2] function_graph: Move graph notrace bit " Steven Rostedt
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

The use of the task->trace_recursion for the logic used for the function
graph depth was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use that
instead.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/trace.h | 63 ++++++++++++++++++++++----------------------
 1 file changed, 32 insertions(+), 31 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 08e79334c8ca..c466c8a1a8cf 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -567,25 +567,6 @@ enum {
  */
 	TRACE_IRQ_BIT,
 
-	/*
-	 * In the very unlikely case that an interrupt came in
-	 * at a start of graph tracing, and we want to trace
-	 * the function in that interrupt, the depth can be greater
-	 * than zero, because of the preempted start of a previous
-	 * trace. In an even more unlikely case, depth could be 2
-	 * if a softirq interrupted the start of graph tracing,
-	 * followed by an interrupt preempting a start of graph
-	 * tracing in the softirq, and depth can even be 3
-	 * if an NMI came in at the start of an interrupt function
-	 * that preempted a softirq start of a function that
-	 * preempted normal context!!!! Luckily, it can't be
-	 * greater than 3, so the next two bits are a mask
-	 * of what the depth is when we set TRACE_GRAPH_FL
-	 */
-
-	TRACE_GRAPH_DEPTH_START_BIT,
-	TRACE_GRAPH_DEPTH_END_BIT,
-
 	/*
 	 * To implement set_graph_notrace, if this bit is set, we ignore
 	 * function graph tracing of called functions, until the return
@@ -598,16 +579,6 @@ enum {
 #define trace_recursion_clear(bit)	do { (current)->trace_recursion &= ~(1<<(bit)); } while (0)
 #define trace_recursion_test(bit)	((current)->trace_recursion & (1<<(bit)))
 
-#define trace_recursion_depth() \
-	(((current)->trace_recursion >> TRACE_GRAPH_DEPTH_START_BIT) & 3)
-#define trace_recursion_set_depth(depth) \
-	do {								\
-		current->trace_recursion &=				\
-			~(3 << TRACE_GRAPH_DEPTH_START_BIT);		\
-		current->trace_recursion |=				\
-			((depth) & 3) << TRACE_GRAPH_DEPTH_START_BIT;	\
-	} while (0)
-
 #define TRACE_CONTEXT_BITS	4
 
 #define TRACE_FTRACE_START	TRACE_FTRACE_BIT
@@ -936,8 +907,38 @@ extern void free_fgraph_ops(struct trace_array *tr);
 
 enum {
 	TRACE_GRAPH_FL		= 1,
+
+	/*
+	 * In the very unlikely case that an interrupt came in
+	 * at a start of graph tracing, and we want to trace
+	 * the function in that interrupt, the depth can be greater
+	 * than zero, because of the preempted start of a previous
+	 * trace. In an even more unlikely case, depth could be 2
+	 * if a softirq interrupted the start of graph tracing,
+	 * followed by an interrupt preempting a start of graph
+	 * tracing in the softirq, and depth can even be 3
+	 * if an NMI came in at the start of an interrupt function
+	 * that preempted a softirq start of a function that
+	 * preempted normal context!!!! Luckily, it can't be
+	 * greater than 3, so the next two bits are a mask
+	 * of what the depth is when we set TRACE_GRAPH_FL
+	 */
+
+	TRACE_GRAPH_DEPTH_START_BIT,
+	TRACE_GRAPH_DEPTH_END_BIT,
 };
 
+static inline unsigned long ftrace_graph_depth(unsigned long *task_var)
+{
+	return (*task_var >> TRACE_GRAPH_DEPTH_START_BIT) & 3;
+}
+
+static inline void ftrace_graph_set_depth(unsigned long *task_var, int depth)
+{
+	*task_var &= ~(3 << TRACE_GRAPH_DEPTH_START_BIT);
+	*task_var |= (depth & 3) << TRACE_GRAPH_DEPTH_START_BIT;
+}
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 extern struct ftrace_hash *ftrace_graph_hash;
 extern struct ftrace_hash *ftrace_graph_notrace_hash;
@@ -961,7 +962,7 @@ ftrace_graph_addr(unsigned long *task_var, struct ftrace_graph_ent *trace)
 		 * when the depth is zero.
 		 */
 		*task_var |= TRACE_GRAPH_FL;
-		trace_recursion_set_depth(trace->depth);
+		ftrace_graph_set_depth(task_var, trace->depth);
 
 		/*
 		 * If no irqs are to be traced, but a set_graph_function
@@ -986,7 +987,7 @@ ftrace_graph_addr_finish(struct fgraph_ops *gops, struct ftrace_graph_ret *trace
 	unsigned long *task_var = fgraph_get_task_var(gops);
 
 	if ((*task_var & TRACE_GRAPH_FL) &&
-	    trace->depth == trace_recursion_depth())
+	    trace->depth == ftrace_graph_depth(task_var))
 		*task_var &= ~TRACE_GRAPH_FL;
 }
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 12/14 v2] function_graph: Move graph notrace bit to shadow stack global var
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (10 preceding siblings ...)
  2019-05-20 14:20 ` [RFC][PATCH 11/14 v2] function_graph: Move graph depth stored data " Steven Rostedt
@ 2019-05-20 14:20 ` Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 13/14 v2] function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data() Steven Rostedt
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

The use of the task->trace_recursion for the logic used for the function
graph no-trace was a bit of an abuse of that variable. Now that there exists
global vars that are per stack for registered graph traces, use that instead.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/trace.h                 | 16 +++++++++-------
 kernel/trace/trace_functions_graph.c | 10 ++++++----
 2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index c466c8a1a8cf..da623dd71b0c 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -566,13 +566,6 @@ enum {
  * can only be modified by current, we can reuse trace_recursion.
  */
 	TRACE_IRQ_BIT,
-
-	/*
-	 * To implement set_graph_notrace, if this bit is set, we ignore
-	 * function graph tracing of called functions, until the return
-	 * function is called to clear it.
-	 */
-	TRACE_GRAPH_NOTRACE_BIT,
 };
 
 #define trace_recursion_set(bit)	do { (current)->trace_recursion |= (1<<(bit)); } while (0)
@@ -926,8 +919,17 @@ enum {
 
 	TRACE_GRAPH_DEPTH_START_BIT,
 	TRACE_GRAPH_DEPTH_END_BIT,
+
+	/*
+	 * To implement set_graph_notrace, if this bit is set, we ignore
+	 * function graph tracing of called functions, until the return
+	 * function is called to clear it.
+	 */
+	TRACE_GRAPH_NOTRACE_BIT,
 };
 
+#define TRACE_GRAPH_NOTRACE		(1 << TRACE_GRAPH_NOTRACE_BIT)
+
 static inline unsigned long ftrace_graph_depth(unsigned long *task_var)
 {
 	return (*task_var >> TRACE_GRAPH_DEPTH_START_BIT) & 3;
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 054ec91e5086..20ee84350f43 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -125,6 +125,7 @@ static inline int ftrace_graph_ignore_irqs(void)
 int trace_graph_entry(struct ftrace_graph_ent *trace,
 		      struct fgraph_ops *gops)
 {
+	unsigned long *task_var = fgraph_get_task_var(gops);
 	struct trace_array *tr = gops->private;
 	struct trace_array_cpu *data;
 	unsigned long flags;
@@ -133,11 +134,11 @@ int trace_graph_entry(struct ftrace_graph_ent *trace,
 	int cpu;
 	int pc;
 
-	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT))
+	if (*task_var & TRACE_GRAPH_NOTRACE)
 		return 0;
 
 	if (ftrace_graph_notrace_addr(trace->func)) {
-		trace_recursion_set(TRACE_GRAPH_NOTRACE_BIT);
+		*task_var |= TRACE_GRAPH_NOTRACE_BIT;
 		/*
 		 * Need to return 1 to have the return called
 		 * that will clear the NOTRACE bit.
@@ -239,6 +240,7 @@ void __trace_graph_return(struct trace_array *tr,
 void trace_graph_return(struct ftrace_graph_ret *trace,
 			struct fgraph_ops *gops)
 {
+	unsigned long *task_var = fgraph_get_task_var(gops);
 	struct trace_array *tr = gops->private;
 	struct trace_array_cpu *data;
 	unsigned long flags;
@@ -248,8 +250,8 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
 
 	ftrace_graph_addr_finish(gops, trace);
 
-	if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
-		trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+	if (*task_var & TRACE_GRAPH_NOTRACE) {
+		*task_var &= ~TRACE_GRAPH_NOTRACE;
 		return;
 	}
 
-- 
2.20.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 13/14 v2] function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (11 preceding siblings ...)
  2019-05-20 14:20 ` [RFC][PATCH 12/14 v2] function_graph: Move graph notrace bit " Steven Rostedt
@ 2019-05-20 14:20 ` Steven Rostedt
  2019-05-20 14:20 ` [RFC][PATCH 14/14 v2] function_graph: Add selftest for passing local variables Steven Rostedt
  2019-05-22 14:19 ` [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Masami Hiramatsu
  14 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Added functions that can be called by a fgraph_ops entryfunc and retfunc to
store state between the entry of the function being traced to the exit of
the same function. The fgraph_ops entryfunc() may call fgraph_reserve_data()
to store up to 4 words onto the task's shadow ret_stack and this then can be
retrived by fgraph_retrieve_data() called by the corresponding retfunc().

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 include/linux/ftrace.h |   3 +
 kernel/trace/fgraph.c  | 244 +++++++++++++++++++++++++++++++++++------
 2 files changed, 213 insertions(+), 34 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index a0bdd1745e56..5b252dc9c1e6 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -757,6 +757,9 @@ struct fgraph_ops {
 	int				idx;
 };
 
+void *fgraph_reserve_data(int size_bytes);
+void *fgraph_retrieve_data(void);
+
 /*
  * Stack of return addresses for functions
  * of a thread.
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index c225b04bcd00..f087a46fa473 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -36,25 +36,36 @@
  * bits: 14 - 15	Type of storage
  *			  0 - reserved
  *			  1 - fgraph_array index
+ *			  2 - reservered data
  * For fgraph_array_index:
  *  bits: 16 - 23	The fgraph_ops fgraph_array index
  *
+ * For reserved data:
+ *  bits: 16 - 17	The size in words that is stored
+ *
  * That is, at the end of function_graph_enter, if the first and forth
  * fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc called
- * on the return of the function being traced, this is what will be on the
- * task's shadow ret_stack: (the stack grows upward)
+ * on the return of the function being traced, and the forth fgraph_ops
+ * stored two words of data, this is what will be on the task's shadow
+ * ret_stack: (the stack grows upward)
+ *
+ * |                                     | <- task->curr_ret_stack
+ * +-------------------------------------+
+ * | (3 << FGRAPH_ARRAY_SHIFT)|type:1|(5)| ( 3 for index of fourth fgraph_ops)
+ * +-------------------------------------+
+ * | (3 << FGRAPH_DATA_SHIFT)|type:2|(4) | ( Data with size of 2 words)
+ * +-------------------------------------+ ( It is 4 words from the ret_stack)
+ * |         STORED DATA WORD 2          |
+ * |         STORED DATA WORD 1          |
+ * +-------------------------------------+
+ * | (0 << FGRAPH_ARRAY_SHIFT)|type:1|(1)| ( 0 for index of first fgraph_ops)
+ * +-------------------------------------+
+ * | struct ftrace_ret_stack             |
+ * |   (stores the saved ret pointer)    |
+ * +-------------------------------------+
+ * |             (X) | (N)               | ( N words away from last ret_stack)
+ * |                                     |
  *
- * |                                  | <- task->curr_ret_stack
- * +----------------------------------+
- * | (3 << FGRAPH_ARRAY_SHIFT)|(2)    | ( 3 for index of fourth fgraph_ops)
- * +----------------------------------+
- * | (0 << FGRAPH_ARRAY_SHIFT)|(1)    | ( 0 for index of first fgraph_ops)
- * +----------------------------------+
- * | struct ftrace_ret_stack          |
- * |   (stores the saved ret pointer) |
- * +----------------------------------+
- * |             (X) | (N)            | ( N words away from previous ret_stack)
- * |                                  |
  *
  * If a backtrace is required, and the real return pointer needs to be
  * fetched, then it looks at the task's curr_ret_stack index, if it
@@ -75,12 +86,17 @@
 enum {
 	FGRAPH_TYPE_RESERVED	= 0,
 	FGRAPH_TYPE_ARRAY	= 1,
+	FGRAPH_TYPE_DATA	= 2,
 };
 
 #define FGRAPH_ARRAY_SIZE	16
 #define FGRAPH_ARRAY_MASK	((1 << FGRAPH_ARRAY_SIZE) - 1)
 #define FGRAPH_ARRAY_SHIFT	(FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
 
+#define FGRAPH_DATA_SIZE	2
+#define FGRAPH_DATA_MASK	((1 << FGRAPH_DATA_SIZE) - 1)
+#define FGRAPH_DATA_SHIFT	(FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
+
 /* Currently the max stack index can't be more than register callers */
 #define FGRAPH_MAX_INDEX	FGRAPH_ARRAY_SIZE
 
@@ -96,6 +112,8 @@ enum {
 
 #define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
 
+#define FGRAPH_MAX_DATA_SIZE (sizeof(long) * 4)
+
 /*
  * Each fgraph_ops has a reservered unsigned long at the end (top) of the
  * ret_stack to store task specific state.
@@ -110,21 +128,44 @@ static int fgraph_array_cnt;
 
 static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
 
+static inline int __get_index(unsigned long val)
+{
+	return val & FGRAPH_RET_INDEX_MASK;
+}
+
+static inline int __get_type(unsigned long val)
+{
+	return (val >> FGRAPH_TYPE_SHIFT) & FGRAPH_TYPE_MASK;
+}
+
+static inline int __get_array(unsigned long val)
+{
+	return (val >> FGRAPH_ARRAY_SHIFT) & FGRAPH_ARRAY_MASK;
+}
+
+static inline int __get_data(unsigned long val)
+{
+	return (val >> FGRAPH_DATA_SHIFT) & FGRAPH_DATA_MASK;
+}
+
 static inline int get_ret_stack_index(struct task_struct *t, int offset)
 {
-	return current->ret_stack[offset] & FGRAPH_RET_INDEX_MASK;
+	return __get_index(current->ret_stack[offset]);
 }
 
 static inline int get_fgraph_type(struct task_struct *t, int offset)
 {
-	return (current->ret_stack[offset] >> FGRAPH_TYPE_SHIFT) &
-		FGRAPH_TYPE_MASK;
+	return __get_type(current->ret_stack[offset]);
 }
 
 static inline int get_fgraph_array(struct task_struct *t, int offset)
 {
-	return (current->ret_stack[offset] >> FGRAPH_ARRAY_SHIFT) &
-		FGRAPH_ARRAY_MASK;
+	return __get_array(current->ret_stack[offset]);
+}
+
+static inline int get_data_idx(struct task_struct *t, int offset)
+{
+	return __get_data(current->ret_stack[offset]);
 }
 
 /* ftrace_graph_entry set to this to tell some archs to run function graph */
@@ -161,6 +202,121 @@ static void ret_stack_init_task_vars(unsigned long *ret_stack)
 	memset(gvals, 0, sizeof(*gvals) * FGRAPH_ARRAY_SIZE);
 }
 
+/**
+ * fgraph_reserve_data - Reserve storage on the task's ret_stack
+ * @size_bytes: The size in bytes to reserve (max of 4 words in size)
+ *
+ * Reserves space of up to 4 words (in word increments) on the
+ * task's ret_stack shadow stack, for a given fgraph_ops during
+ * the entryfunc() call. If entryfunc() returns zero, the storage
+ * is discarded. An entryfunc() can only call this once per iteration.
+ * The fgraph_ops retfunc() can retrieve this stored data with
+ * fgraph_retrieve_data().
+ *
+ * Returns: On success, a pointer to the data on the stack.
+ *   Otherwise, NULL if there's not enough space left on the
+ *   ret_stack for the data, or if fgraph_reserve_data() was called
+ *   more than once for a single entryfunc() call.
+ */
+void *fgraph_reserve_data(int size_bytes)
+{
+	unsigned long val;
+	void *data;
+	int curr_ret_stack = current->curr_ret_stack;
+	int data_size;
+	int size;
+
+	if (size_bytes > FGRAPH_MAX_DATA_SIZE)
+		return NULL;
+
+	/* Convert to number of longs + data word */
+	data_size = ALIGN(size_bytes, sizeof(long)) / sizeof(long) + 1;
+
+	/* The size to add to ret_stack (including the reserve word) */
+	size = data_size + 1;
+
+	val = current->ret_stack[curr_ret_stack - 1];
+
+	switch (__get_type(val)) {
+	case FGRAPH_TYPE_RESERVED:
+		/*
+		 * A reserve word is only saved after the ret_stack
+		 * or after a data storage, not after an fgraph_array
+		 * entry. It's OK if its after the ret_stack in which
+		 * case the index will be one, but if the index is
+		 * greater than 1 it means it's a double call to
+		 * fgraph_reserve_data()
+		 */
+		if (__get_index(val) > 1)
+			return NULL;
+		/*
+		 * Leave the reserve in case the entryfunc() doesn't
+		 * want to be recorded.
+		 */
+		break;
+	case FGRAPH_TYPE_ARRAY:
+		break;
+	default:
+		return NULL;
+	}
+	data = &current->ret_stack[curr_ret_stack];
+
+	curr_ret_stack += size;
+	if (unlikely(curr_ret_stack >= SHADOW_STACK_MAX_INDEX))
+		return NULL;
+
+	val = __get_index(val) + size;
+
+	/* Set the last word to be reserved */
+	current->ret_stack[curr_ret_stack - 1] = val;
+
+	/* Make sure interrupts see this */
+	barrier();
+	current->curr_ret_stack = curr_ret_stack;
+	/* Again sync with interrupts, and reset reserve */
+	current->ret_stack[curr_ret_stack - 1] = val;
+
+	val = (data_size << FGRAPH_DATA_SHIFT) |
+		(FGRAPH_TYPE_DATA << FGRAPH_TYPE_SHIFT) |
+		(val - 1);
+
+	/* Save the data header */
+	current->ret_stack[curr_ret_stack - 2] = val;
+
+	return data;
+}
+
+/**
+ * fgraph_retrieve_data - Retrieve stored data from fgraph_reserve_data()
+ *
+ * This is to be called by a fgraph_ops retfunc(), to retrieve data that
+ * was stored by the fgraph_ops entryfunc() on the function entry.
+ * That is, this will retrieve the data that was reserved on the
+ * entry of the function that corresponds to the exit of the function
+ * that the fgraph_ops retfunc() is called on.
+ *
+ * Returns: The stored data from fgraph_reserve_data() called by the
+ *    matching entryfunc() for the retfunc() this is called from.
+ *   Or NULL if there was nothing stored.
+ */
+void *fgraph_retrieve_data(void)
+{
+	unsigned long val;
+	int curr_ret_stack = current->curr_ret_stack;
+
+	/* Top of stack is the fgraph_ops */
+	val = current->ret_stack[curr_ret_stack - 1];
+	/* Check if there's nothing between the fgraph_ops and ret_stack */
+	if (__get_index(val) == 1)
+		return NULL;
+	val = current->ret_stack[curr_ret_stack - 2];
+	if (__get_type(val) != FGRAPH_TYPE_DATA)
+		return NULL;
+
+	return &current->ret_stack[curr_ret_stack -
+				   (__get_data(val) + 1)];
+}
+
 /**
  * fgraph_get_task_var - retrieve a task specific state variable
  * @gops: The ftrace_ops that owns the task specific variable
@@ -330,6 +486,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			 unsigned long frame_pointer, unsigned long *retp)
 {
 	struct ftrace_graph_ent trace;
+	int save_curr_ret_stack;
 	int offset;
 	int start;
 	int type;
@@ -357,8 +514,10 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			atomic_inc(&current->trace_overrun);
 			break;
 		}
+		save_curr_ret_stack = current->curr_ret_stack;
 		if (ftrace_ops_test(&gops->ops, func, NULL) &&
 		    gops->entryfunc(&trace, gops)) {
+			/* Note, curr_ret_stack could change by enryfunc() */
 			offset = current->curr_ret_stack;
 			/* Check the top level stored word */
 			type = get_fgraph_type(current, offset - 1);
@@ -392,6 +551,9 @@ int function_graph_enter(unsigned long ret, unsigned long func,
 			barrier();
 			current->ret_stack[offset] = val;
 			cnt++;
+		} else {
+			/* Clear out any saved storage */
+			current->curr_ret_stack = save_curr_ret_stack;
 		}
 	}
 
@@ -502,10 +664,10 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 	struct ftrace_ret_stack *ret_stack;
 	struct ftrace_graph_ret trace;
 	unsigned long ret;
-	int offset;
+	int curr_ret_stack;
+	int stop_at;
 	int index;
 	int idx;
-	int i;
 
 	ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer);
 
@@ -518,24 +680,38 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
 
 	trace.rettime = trace_clock_local();
 
-	offset = current->curr_ret_stack - 1;
-	index = get_ret_stack_index(current, offset);
+	curr_ret_stack = current->curr_ret_stack;
+	index = get_ret_stack_index(current, curr_ret_stack - 1);
+
+	stop_at = curr_ret_stack - index;
 
 	/* index has to be at least one! Optimize for it */
-	i = 0;
 	do {
-		idx = get_fgraph_array(current, offset - i);
-		fgraph_array[idx]->retfunc(&trace, fgraph_array[idx]);
-		i++;
-	} while (i < index);
+		unsigned long val;
+
+		val = current->ret_stack[curr_ret_stack - 1];
+		switch (__get_type(val)) {
+		case FGRAPH_TYPE_ARRAY:
+			idx = __get_array(val);
+			fgraph_array[idx]->retfunc(&trace, fgraph_array[idx]);
+			/* Fall through */
+		case FGRAPH_TYPE_RESERVED:
+			curr_ret_stack--;
+			break;
+		case FGRAPH_TYPE_DATA:
+			curr_ret_stack -= __get_data(val);
+			break;
+		default:
+			WARN_ONCE(1, "Bad fgraph ret_stack data type %d",
+				  __get_type(val));
+			curr_ret_stack--;
+		}
+		/* Make sure interrupts see the update after the above */
+		barrier();
+		current->curr_ret_stack = curr_ret_stack;
+	} while (curr_ret_stack > stop_at);
 
-	/*
-	 * The ftrace_graph_return() may still access the current
-	 * ret_stack structure, we need to make sure the update of
-	 * curr_ret_stack is after that.
-	 */
-	barrier();
-	current->curr_ret_stack -= index + FGRAPH_RET_INDEX;
+	current->curr_ret_stack -= FGRAPH_RET_INDEX;
 	current->curr_ret_depth--;
 	return ret;
 }
-- 
2.20.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [RFC][PATCH 14/14 v2] function_graph: Add selftest for passing local variables
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (12 preceding siblings ...)
  2019-05-20 14:20 ` [RFC][PATCH 13/14 v2] function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data() Steven Rostedt
@ 2019-05-20 14:20 ` Steven Rostedt
  2019-05-22 14:19 ` [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Masami Hiramatsu
  14 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-05-20 14:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, Peter Zijlstra,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

Add boot up selftest that passes variables from a function entry to a
function exit, and make sure that they do get passed around.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/trace/trace_selftest.c | 161 ++++++++++++++++++++++++++++++++++
 1 file changed, 161 insertions(+)

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index facd5d1c05e7..9318677b5bf2 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -718,6 +718,165 @@ trace_selftest_startup_function(struct tracer *trace, struct trace_array *tr)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
+#ifdef CONFIG_DYNAMIC_FTRACE
+
+#define BYTE_NUMBER 123
+#define SHORT_NUMBER 12345
+#define WORD_NUMBER 1234567890
+#define LONG_NUMBER 1234567890123456789LL
+
+static int fgraph_store_size __initdata;
+static const char *fgraph_store_type_name __initdata;
+static char *fgraph_error_str __initdata;
+static char fgraph_error_str_buf[128] __initdata;
+
+static __init int store_entry(struct ftrace_graph_ent *trace,
+			      struct fgraph_ops *gops)
+{
+	const char *type = fgraph_store_type_name;
+	int size = fgraph_store_size;
+	void *p;
+
+	p = fgraph_reserve_data(size);
+	if (!p) {
+		snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+			 "Failed to reserve %s\n", type);
+		fgraph_error_str = fgraph_error_str_buf;
+		return 0;
+	}
+
+	switch (fgraph_store_size) {
+	case 1:
+		*(char *)p = BYTE_NUMBER;
+		break;
+	case 2:
+		*(short *)p = SHORT_NUMBER;
+		break;
+	case 4:
+		*(int *)p = WORD_NUMBER;
+		break;
+	case 8:
+		*(long long *)p = LONG_NUMBER;
+		break;
+	}
+
+	return 1;
+}
+
+static __init void store_return(struct ftrace_graph_ret *trace,
+				struct fgraph_ops *gops)
+{
+	const char *type = fgraph_store_type_name;
+	long long expect = 0;
+	long long found = -1;
+	char *p;
+
+	p = fgraph_retrieve_data();
+	if (!p) {
+		snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+			 "Failed to retrieve %s\n", type);
+		fgraph_error_str = fgraph_error_str_buf;
+		return;
+	}
+
+	switch (fgraph_store_size) {
+	case 1:
+		expect = BYTE_NUMBER;
+		found = *(char *)p;
+		break;
+	case 2:
+		expect = SHORT_NUMBER;
+		found = *(short *)p;
+		break;
+	case 4:
+		expect = WORD_NUMBER;
+		found = *(int *)p;
+		break;
+	case 8:
+		expect = LONG_NUMBER;
+		found = *(long long *)p;
+		break;
+	}
+
+	if (found != expect) {
+		snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+			 "%s returned not %lld but %lld\n", type, expect, found);
+		fgraph_error_str = fgraph_error_str_buf;
+		return;
+	}
+	fgraph_error_str = NULL;
+}
+
+static struct fgraph_ops store_bytes __initdata = {
+	.entryfunc		= store_entry,
+	.retfunc		= store_return,
+};
+
+static int __init test_graph_storage_type(const char *name, int size)
+{
+	char *func_name;
+	int len;
+	int ret;
+
+	fgraph_store_type_name = name;
+	fgraph_store_size = size;
+
+	snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+		 "Failed to execute storage %s\n", name);
+	fgraph_error_str = fgraph_error_str_buf;
+
+	printk(KERN_CONT "PASSED\n");
+	pr_info("Testing fgraph storage of %d byte%s: ", size, size > 1 ? "s" : "");
+
+	func_name = "*" __stringify(DYN_FTRACE_TEST_NAME);
+	len = strlen(func_name);
+
+	ret = ftrace_set_filter(&store_bytes.ops, func_name, len, 1);
+	if (ret && ret != -ENODEV) {
+		pr_cont("*Could not set filter* ");
+		return -1;
+	}
+
+	ret = register_ftrace_graph(&store_bytes);
+	if (ret) {
+		printk(KERN_WARNING "Failed to init store_bytes fgraph tracing\n");
+		return -1;
+	}
+
+	DYN_FTRACE_TEST_NAME();
+
+	unregister_ftrace_graph(&store_bytes);
+
+	if (fgraph_error_str) {
+		printk(KERN_CONT "*** %s ***", fgraph_error_str);
+		return -1;
+	}
+
+	return 0;
+}
+/* Test the storage passed across function_graph entry and return */
+static __init int test_graph_storage(void)
+{
+	int ret;
+
+	ret = test_graph_storage_type("byte", 1);
+	if (ret)
+		return ret;
+	ret = test_graph_storage_type("short", 2);
+	if (ret)
+		return ret;
+	ret = test_graph_storage_type("word", 4);
+	if (ret)
+		return ret;
+	ret = test_graph_storage_type("long long", 8);
+	if (ret)
+		return ret;
+	return 0;
+}
+#else
+static inline int test_graph_storage(void) { return 0; }
+#endif /* CONFIG_DYNAMIC_FTRACE */
+
 /* Maximum number of functions to trace before diagnosing a hang */
 #define GRAPH_MAX_FUNC_TEST	100000000
 
@@ -805,6 +964,8 @@ trace_selftest_startup_function_graph(struct tracer *trace,
 		goto out;
 	}
 
+	ret = test_graph_storage();
+
 	/* Don't test dynamic tracing, the function tracer already did */
 
 out:
-- 
2.20.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users
  2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
                   ` (13 preceding siblings ...)
  2019-05-20 14:20 ` [RFC][PATCH 14/14 v2] function_graph: Add selftest for passing local variables Steven Rostedt
@ 2019-05-22 14:19 ` Masami Hiramatsu
  2019-05-22 14:40   ` Steven Rostedt
  14 siblings, 1 reply; 31+ messages in thread
From: Masami Hiramatsu @ 2019-05-22 14:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Joel Fernandes, Andy Lutomirski,
	Mark Rutland, Namhyung Kim, Frank Ch. Eigler

On Mon, 20 May 2019 10:20:01 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> 
> The background for this is explained in the V1 version found here:
> 
>  http://lkml.kernel.org/r/20181122012708.491151844@goodmis.org
> 
> The TL;DR; is this:
> 
>  The function graph tracer required a rewrite, mainly because it
>  can only allow one callback registered at a time. The main motivation
>  for this change is to allow kretprobes to use the code of function
>  graph tracer, which should allow all archs that have function graph
>  tracing to also have kretprobes with no extra work.
> 
> Masami told me that one requirement was to allow the function entry
> callback to store data on the shadow stack that can be retrieved by
> the the function return callback. I added this, as well as a per-task
> variable (used by one of the function graph users).
> 
> The two functions to allow the storing of data on the stack and
> retrieval of it are:
> 
>  void *fgraph_reserve_data(int size_in_bytes)
> 
>     Allows the entry function to reserve up to 4 words of data on
>     the shadow stack. On success, a pointer to the contents is returned.
>     This may be only called once per entry function.
> 
>  void *fgraph_retrieve_data(void)
> 
>     Allows the return function to retrieve the reserved data that was
>     allocated by the entry function.

Nice! this seems good for kretprobe too. I'll review and try to port
kretprobe on this framework.

Thank you!

> 
> Note, this code has passed my full test suite.
> 
> Changes since v1:
> 
>   - Well, the first part of that series was already merged.
>     But that was just the preparation for this part.
> 
>   - Allocate a page for the shadow stack split it up that way.
>     When the stack is full, we stop allowing more to be added (stop tracing).
> 
>   - Added the reserve and retrieve of private data on the shadow stack
>     for individual entry/return callbacks to pass data to each other.
> 
>   - Added a "per task" data that can be used by a fgraph_ops for all
>     function callbacks for a specific task.
> 
> Steven Rostedt (VMware) (14):
>       function_graph: Convert ret_stack to a series of longs
>       function_graph: Add an array structure that will allow multiple callbacks
>       function_graph: Allow multiple users to attach to function graph
>       function_graph: Remove logic around ftrace_graph_entry and return
>       ftrace/function_graph: Pass fgraph_ops to function graph callbacks
>       ftrace: Allow function_graph tracer to be enabled in instances
>       ftrace: Allow ftrace startup flags exist without dynamic ftrace
>       function_graph: Have the instances use their own ftrace_ops for filtering
>       function_graph: Add "task variables" per task for fgraph_ops
>       function_graph: Move set_graph_function tests to shadow stack global var
>       function_graph: Move graph depth stored data to shadow stack global var
>       function_graph: Move graph notrace bit to shadow stack global var
>       function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()
>       function_graph: Add selftest for passing local variables
> 
> ----
>  include/linux/ftrace.h               |  37 +-
>  include/linux/sched.h                |   2 +-
>  kernel/trace/fgraph.c                | 862 ++++++++++++++++++++++++++++-------
>  kernel/trace/ftrace.c                |  13 +-
>  kernel/trace/ftrace_internal.h       |   2 -
>  kernel/trace/trace.h                 | 132 +++---
>  kernel/trace/trace_functions.c       |   7 +
>  kernel/trace/trace_functions_graph.c |  96 ++--
>  kernel/trace/trace_irqsoff.c         |  10 +-
>  kernel/trace/trace_sched_wakeup.c    |  10 +-
>  kernel/trace/trace_selftest.c        | 168 ++++++-
>  11 files changed, 1048 insertions(+), 291 deletions(-)


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users
  2019-05-22 14:19 ` [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Masami Hiramatsu
@ 2019-05-22 14:40   ` Steven Rostedt
  2019-05-29  6:47     ` Masami Hiramatsu
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2019-05-22 14:40 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

On Wed, 22 May 2019 23:19:55 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> >  void *fgraph_reserve_data(int size_in_bytes)
> > 
> >     Allows the entry function to reserve up to 4 words of data on
> >     the shadow stack. On success, a pointer to the contents is returned.
> >     This may be only called once per entry function.
> > 
> >  void *fgraph_retrieve_data(void)
> > 
> >     Allows the return function to retrieve the reserved data that was
> >     allocated by the entry function.  
> 
> Nice! this seems good for kretprobe too. I'll review and try to port
> kretprobe on this framework.

If you rather pull from my git repo and not download all the patches,
they are currently available in my ftrace/fgraph-multi branch.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 01/14 v2] function_graph: Convert ret_stack to a series of longs
  2019-05-20 14:20 ` [RFC][PATCH 01/14 v2] function_graph: Convert ret_stack to a series of longs Steven Rostedt
@ 2019-05-24 11:11   ` Peter Zijlstra
  2019-05-24 12:05     ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-05-24 11:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

On Mon, May 20, 2019 at 10:20:02AM -0400, Steven Rostedt wrote:

> +#define FGRAPH_RET_SIZE (sizeof(struct ftrace_ret_stack))
> +#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))

I think you want to write that like:

	BUILD_BUG_ON(sizeof(ftrace_ret_stack) % sizeof(long));

It'd be very weird for that sizeof not to be right.

> +#define SHADOW_STACK_SIZE (PAGE_SIZE)

Do we really need that big a shadow stack?

> +#define SHADOW_STACK_INDEX			\
> +	(ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
> +/* Leave on a buffer at the end */
> +#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
> +
> +#define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
> +#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
> +#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })

I'm thinking something like:

#define RET_PUSH(s, val)				\
do {							\
	(s) -= sizeof(val);				\
	(typeof(val) *)(s) = val;			\
} while (0)

#define RET_POP(s, type)				\
({							\
	type *__ptr = (void *)(s);			\
	(s) += sizeof(type);				\
	*__ptr;						\
})

Would me clearer?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 03/14 v2] function_graph: Allow multiple users to attach to function graph
  2019-05-20 14:20 ` [RFC][PATCH 03/14 v2] function_graph: Allow multiple users to attach to function graph Steven Rostedt
@ 2019-05-24 11:26   ` Peter Zijlstra
  2019-05-24 12:12     ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-05-24 11:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

On Mon, May 20, 2019 at 10:20:04AM -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> 
> Allow for multiple users to attach to function graph tracer at the same
> time. Only 16 simultaneous users can attach to the tracer. This is because
> there's an array that stores the pointers to the attached fgraph_ops. When a
> a function being traced is entered, each of the ftrace_ops entryfunc is
> called and if it returns non zero, its index into the array will be added to
> the shadow stack.
> 
> On exit of the function being traced, the shadow stack will contain the
> indexes of the ftrace_ops on the array that want their retfunc to be called.
> 
> Because a function may sleep for a long time (if a task sleeps itself), the
> return of the function may be literally days later. If the ftrace_ops is
> removed, its place on the array is replaced with a ftrace_ops that contains
> the stub functions and that will be called when the function finally
> returns.

But but but but.. why not add all the required bits to the shadow stack
in the first place and do away with the array entirely?

So on ret, just keep POP'ing until either the stack is empty or the
entry is for another function.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 01/14 v2] function_graph: Convert ret_stack to a series of longs
  2019-05-24 11:11   ` Peter Zijlstra
@ 2019-05-24 12:05     ` Steven Rostedt
  2019-06-03 11:30       ` Masami Hiramatsu
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2019-05-24 12:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

On Fri, 24 May 2019 13:11:44 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, May 20, 2019 at 10:20:02AM -0400, Steven Rostedt wrote:
> 
> > +#define FGRAPH_RET_SIZE (sizeof(struct ftrace_ret_stack))
> > +#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))  
> 
> I think you want to write that like:
> 
> 	BUILD_BUG_ON(sizeof(ftrace_ret_stack) % sizeof(long));

Sure.

> 
> It'd be very weird for that sizeof not to be right.

Agreed, but I was paranoid. The BUILD_BUG_ON() would also work.

> 
> > +#define SHADOW_STACK_SIZE (PAGE_SIZE)  
> 
> Do we really need that big a shadow stack?

Well, this is a sticky point. I allow up to 16 users at a time
(although I can't imagine more than 5, but you never know), and each
user adds a long and up to 4 more words (which is probably unlikely
anyway). And then we can have deep call stacks (we are getting deeper
each release it seems).

I figured, I start with a page size, and then in the future we can make
it dynamic, or shrink it if it proves to be too much.

> 
> > +#define SHADOW_STACK_INDEX			\
> > +	(ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
> > +/* Leave on a buffer at the end */
> > +#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
> > +
> > +#define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
> > +#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
> > +#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })  
> 
> I'm thinking something like:
> 
> #define RET_PUSH(s, val)				\
> do {							\
> 	(s) -= sizeof(val);				\
> 	(typeof(val) *)(s) = val;			\
> } while (0)
> 
> #define RET_POP(s, type)				\
> ({							\
> 	type *__ptr = (void *)(s);			\
> 	(s) += sizeof(type);				\
> 	*__ptr;						\
> })
> 
> Would me clearer?

Due to races with interrupts, and this not being an atomic operation, I
had to play tricks with moving the stack pointer and adding data to it.
So I wanted to keep the changing of the stack pointer and adding and
retrieving of the stack data separate.

Later patches remove the RET_STACK_INC/DEC() anyway.

Thanks for taking the time to look at these patches!

-- Steve


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 03/14 v2] function_graph: Allow multiple users to attach to function graph
  2019-05-24 11:26   ` Peter Zijlstra
@ 2019-05-24 12:12     ` Steven Rostedt
  2019-05-24 12:27       ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2019-05-24 12:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

On Fri, 24 May 2019 13:26:08 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> But but but but.. why not add all the required bits to the shadow stack
> in the first place and do away with the array entirely?

What required bits would that be? The pointer to the fgraph_ops,
because we need that to pass to the calling function.

> 
> So on ret, just keep POP'ing until either the stack is empty or the
> entry is for another function.

When we hit a fgraph_ops, how do we know if it was freed or not? We
can't just blindly reference it.

The idea of the array, is that we can maintain state in a single
location of when the fgraph_ops is freed. If we return from a function,
we have an index and a counter, and if the counter doesn't match with
what's in the array, then we know that the fgraph_ops is no longer
around and we just drop it.

The reason for the array, is to keep track of if the fgraph_ops has
been freed or not. Otherwise, when we unregister the fgraph_ops, we
would need to search all shadow stacks, looking for it to unreference
it.

Believe me, I rather not have that array, but I couldn't come up with a
better solution to handle freeing of fgraph_ops.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 03/14 v2] function_graph: Allow multiple users to attach to function graph
  2019-05-24 12:12     ` Steven Rostedt
@ 2019-05-24 12:27       ` Peter Zijlstra
  2019-05-24 12:57         ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-05-24 12:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

On Fri, May 24, 2019 at 08:12:19AM -0400, Steven Rostedt wrote:
> On Fri, 24 May 2019 13:26:08 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > But but but but.. why not add all the required bits to the shadow stack
> > in the first place and do away with the array entirely?
> 
> What required bits would that be? The pointer to the fgraph_ops,
> because we need that to pass to the calling function.

I was thinking a smaller structure comprising of {func,callback}, which
you pop, if func matches, run callback.

> > So on ret, just keep POP'ing until either the stack is empty or the
> > entry is for another function.
> 
> When we hit a fgraph_ops, how do we know if it was freed or not? We
> can't just blindly reference it.
> 
> The idea of the array, is that we can maintain state in a single
> location of when the fgraph_ops is freed. If we return from a function,
> we have an index and a counter, and if the counter doesn't match with
> what's in the array, then we know that the fgraph_ops is no longer
> around and we just drop it.
> 
> The reason for the array, is to keep track of if the fgraph_ops has
> been freed or not. Otherwise, when we unregister the fgraph_ops, we
> would need to search all shadow stacks, looking for it to unreference
> it.
> 
> Believe me, I rather not have that array, but I couldn't come up with a
> better solution to handle freeing of fgraph_ops.

The trivial answer would be to refcount the thing, but can't we make
rcu_tasks do this?

And delay the unreg until all active users are gone -- who gives a crap
that can take a while.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 03/14 v2] function_graph: Allow multiple users to attach to function graph
  2019-05-24 12:27       ` Peter Zijlstra
@ 2019-05-24 12:57         ` Steven Rostedt
  2019-05-27 10:10           ` Peter Zijlstra
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2019-05-24 12:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

On Fri, 24 May 2019 14:27:24 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> > Believe me, I rather not have that array, but I couldn't come up with a
> > better solution to handle freeing of fgraph_ops.  
> 
> The trivial answer would be to refcount the thing, but can't we make
> rcu_tasks do this?

But wouldn't refcounts require atomic operations, something that would
be excruciatingly slow for something that runs on all functions.

rcu_tasks doesn't cross voluntary sleeps, which this does.

> 
> And delay the unreg until all active users are gone -- who gives a crap
> that can take a while.

It could literally be forever (well, until the machine reboots). And
something that could appear to be a memory leak, although a very slow
one. But probably be hard to have more than the number of tasks on the
system.

-- Steve


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 03/14 v2] function_graph: Allow multiple users to attach to function graph
  2019-05-24 12:57         ` Steven Rostedt
@ 2019-05-27 10:10           ` Peter Zijlstra
  2019-05-27 11:08             ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Peter Zijlstra @ 2019-05-27 10:10 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

On Fri, May 24, 2019 at 08:57:44AM -0400, Steven Rostedt wrote:
> On Fri, 24 May 2019 14:27:24 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > Believe me, I rather not have that array, but I couldn't come up with a
> > > better solution to handle freeing of fgraph_ops.  
> > 
> > The trivial answer would be to refcount the thing, but can't we make
> > rcu_tasks do this?
> 
> But wouldn't refcounts require atomic operations, something that would
> be excruciatingly slow for something that runs on all functions.

Obviously, which is why I suggested something else :-)

> rcu_tasks doesn't cross voluntary sleeps, which this does.

Sure, but we can 'fix' that, surely. Alternatively we use SRCU, or
something else, a blend between SRCU and percpu-rwsem for example, SRCU
has that annoying smp_mb() on the read side, where percpu-rwsem doesn't
have that.

> > And delay the unreg until all active users are gone -- who gives a crap
> > that can take a while.
> 
> It could literally be forever (well, until the machine reboots). And
> something that could appear to be a memory leak, although a very slow
> one. But probably be hard to have more than the number of tasks on the
> system.

Again, who cares.. ? How often do you have return trace functions that
dissapear, afaict that only happens with modules, and neither
function_graph_trace nor kprobes are modules.

It'll just mean the module unload will be stuck, possibly forever.
That's not something I care about. Also, if we _really_ care, we can
mandate that module users use some sort of ugly trampoline that covers
their asses at the cost of some performance.

Getting rid of that array makes this code far saner (and I suspect
faster too).

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 03/14 v2] function_graph: Allow multiple users to attach to function graph
  2019-05-27 10:10           ` Peter Zijlstra
@ 2019-05-27 11:08             ` Steven Rostedt
  0 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-05-27 11:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

On Mon, 27 May 2019 12:10:04 +0200
Peter Zijlstra <peterz@infradead.org> wrote:


> > rcu_tasks doesn't cross voluntary sleeps, which this does.  
> 
> Sure, but we can 'fix' that, surely.

Well, that's the point of the rcu_tasks. To let us know when a task has
voluntarily slept. I don't think we want to "fix" that.

> Alternatively we use SRCU, or
> something else, a blend between SRCU and percpu-rwsem for example, SRCU
> has that annoying smp_mb() on the read side, where percpu-rwsem doesn't
> have that.
> 
> > > And delay the unreg until all active users are gone -- who gives a crap
> > > that can take a while.  
> > 
> > It could literally be forever (well, until the machine reboots). And
> > something that could appear to be a memory leak, although a very slow
> > one. But probably be hard to have more than the number of tasks on the
> > system.  
> 
> Again, who cares.. ? How often do you have return trace functions that
> dissapear, afaict that only happens with modules, and neither
> function_graph_trace nor kprobes are modules.
> 
> It'll just mean the module unload will be stuck, possibly forever.
> That's not something I care about. Also, if we _really_ care, we can
> mandate that module users use some sort of ugly trampoline that covers
> their asses at the cost of some performance.
> 
> Getting rid of that array makes this code far saner (and I suspect
> faster too).

The array is not the complex part of this. It was probably the easiest
part of this patch series. It just shows up a lot in the beginning
because I needed it to work before doing anything else. The more
difficult parts came with the passing of data from entry to exit.

I plan on keeping the array for now, as it is just an internal
implementation detail, that gives us only a limitation of the array
size that is noticed outside of the function graph code. If we find
some kind of RCU alternative, then we can switch to that in the
future and remove the array limitation.

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users
  2019-05-22 14:40   ` Steven Rostedt
@ 2019-05-29  6:47     ` Masami Hiramatsu
  2019-05-29  9:25       ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Masami Hiramatsu @ 2019-05-29  6:47 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

On Wed, 22 May 2019 10:40:27 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 22 May 2019 23:19:55 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> > >  void *fgraph_reserve_data(int size_in_bytes)
> > > 
> > >     Allows the entry function to reserve up to 4 words of data on
> > >     the shadow stack. On success, a pointer to the contents is returned.
> > >     This may be only called once per entry function.
> > > 
> > >  void *fgraph_retrieve_data(void)
> > > 
> > >     Allows the return function to retrieve the reserved data that was
> > >     allocated by the entry function.  
> > 
> > Nice! this seems good for kretprobe too. I'll review and try to port
> > kretprobe on this framework.
> 
> If you rather pull from my git repo and not download all the patches,
> they are currently available in my ftrace/fgraph-multi branch.

Hi Steve,

I found that these interfaces seem tightly coupled with fgraph_ops. But that
cause a problem when I'm using it from kretprobe.

kretprobe has 2 handlers, entry handler and return handler, and both need
pt_regs. But fgraph_ops's entryfunc and retfunc do not pass the pt_regs.
That is the biggest issue for me on these APIs.
Can we expand fgraph_ops with regs parameter?

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users
  2019-05-29  6:47     ` Masami Hiramatsu
@ 2019-05-29  9:25       ` Steven Rostedt
  2019-05-30  9:29         ` Masami Hiramatsu
  0 siblings, 1 reply; 31+ messages in thread
From: Steven Rostedt @ 2019-05-29  9:25 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

On Wed, 29 May 2019 15:47:40 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:


> Hi Steve,
> 
> I found that these interfaces seem tightly coupled with fgraph_ops. But that
> cause a problem when I'm using it from kretprobe.

I was thinking that the kretprobes could use the fgraph_ops like
kprobes uses ftrace_ops.

> 
> kretprobe has 2 handlers, entry handler and return handler, and both need
> pt_regs. But fgraph_ops's entryfunc and retfunc do not pass the pt_regs.
> That is the biggest issue for me on these APIs.
> Can we expand fgraph_ops with regs parameter?

Ug. Yeah, of course you need that :-/

OK, so this series isn't enough to allow kretprobes to use it yet. OK,
I plan on still keeping it because it does allow for placing function
graph tracer into instances with their own filters.

I'll look into adding a REGS flag like we do with ftrace_ops.

Does the return need all regs? Or is just the return code good enough?

-- Steve


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users
  2019-05-29  9:25       ` Steven Rostedt
@ 2019-05-30  9:29         ` Masami Hiramatsu
  2019-06-08  6:23           ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Masami Hiramatsu @ 2019-05-30  9:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

On Wed, 29 May 2019 05:25:21 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Wed, 29 May 2019 15:47:40 +0900
> Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> 
> > Hi Steve,
> > 
> > I found that these interfaces seem tightly coupled with fgraph_ops. But that
> > cause a problem when I'm using it from kretprobe.
> 
> I was thinking that the kretprobes could use the fgraph_ops like
> kprobes uses ftrace_ops.
> 
> > 
> > kretprobe has 2 handlers, entry handler and return handler, and both need
> > pt_regs. But fgraph_ops's entryfunc and retfunc do not pass the pt_regs.
> > That is the biggest issue for me on these APIs.
> > Can we expand fgraph_ops with regs parameter?
> 
> Ug. Yeah, of course you need that :-/
> 
> OK, so this series isn't enough to allow kretprobes to use it yet. OK,
> I plan on still keeping it because it does allow for placing function
> graph tracer into instances with their own filters.

OK, that will be a "regs" extension.
> 
> I'll look into adding a REGS flag like we do with ftrace_ops.
> 
> Does the return need all regs? Or is just the return code good enough?

Since it depends on arch, I think all regs we need. And for the entry
handler, we need all.

Thank you,


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 01/14 v2] function_graph: Convert ret_stack to a series of longs
  2019-05-24 12:05     ` Steven Rostedt
@ 2019-06-03 11:30       ` Masami Hiramatsu
  2019-06-04  9:04         ` Steven Rostedt
  0 siblings, 1 reply; 31+ messages in thread
From: Masami Hiramatsu @ 2019-06-03 11:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Masami Hiramatsu, Josh Poimboeuf,
	Frederic Weisbecker, Joel Fernandes, Andy Lutomirski,
	Mark Rutland, Namhyung Kim, Frank Ch. Eigler

On Fri, 24 May 2019 08:05:53 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> > 
> > > +#define SHADOW_STACK_SIZE (PAGE_SIZE)  
> > 
> > Do we really need that big a shadow stack?
> 
> Well, this is a sticky point. I allow up to 16 users at a time
> (although I can't imagine more than 5, but you never know), and each
> user adds a long and up to 4 more words (which is probably unlikely
> anyway). And then we can have deep call stacks (we are getting deeper
> each release it seems).
> 
> I figured, I start with a page size, and then in the future we can make
> it dynamic, or shrink it if it proves to be too much.

I'd prefer dynamic allocation, based on the number of users or actual
stack starvation.


Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 01/14 v2] function_graph: Convert ret_stack to a series of longs
  2019-06-03 11:30       ` Masami Hiramatsu
@ 2019-06-04  9:04         ` Steven Rostedt
  0 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-06-04  9:04 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

On Mon, 3 Jun 2019 20:30:49 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> > >   
> > > > +#define SHADOW_STACK_SIZE (PAGE_SIZE)    
> > > 
> > > Do we really need that big a shadow stack?  
> > 
> > Well, this is a sticky point. I allow up to 16 users at a time
> > (although I can't imagine more than 5, but you never know), and each
> > user adds a long and up to 4 more words (which is probably unlikely
> > anyway). And then we can have deep call stacks (we are getting deeper
> > each release it seems).
> > 
> > I figured, I start with a page size, and then in the future we can make
> > it dynamic, or shrink it if it proves to be too much.  
> 
> I'd prefer dynamic allocation, based on the number of users or actual
> stack starvation.

As stated, it's something we can improve on in the future. I'll
probably be pushing out this series for linux-next, and then we can
incrementally improve it.

First on my list is to add a REGS version of function_graph such that
kretprobes can use it ;-)

-- Steve

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users
  2019-05-30  9:29         ` Masami Hiramatsu
@ 2019-06-08  6:23           ` Steven Rostedt
  0 siblings, 0 replies; 31+ messages in thread
From: Steven Rostedt @ 2019-06-08  6:23 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner,
	Peter Zijlstra, Josh Poimboeuf, Frederic Weisbecker,
	Joel Fernandes, Andy Lutomirski, Mark Rutland, Namhyung Kim,
	Frank Ch. Eigler

On Thu, 30 May 2019 18:29:20 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:

> > OK, so this series isn't enough to allow kretprobes to use it yet. OK,
> > I plan on still keeping it because it does allow for placing function
> > graph tracer into instances with their own filters.  
> 
> OK, that will be a "regs" extension.
> > 
> > I'll look into adding a REGS flag like we do with ftrace_ops.
> > 
> > Does the return need all regs? Or is just the return code good enough?  
> 
> Since it depends on arch, I think all regs we need. And for the entry
> handler, we need all.

When I get back home, I'm going to push this series to linux-next. And
then I'll start adding the REGS extension for the next Linux release
after this series gets in.

Thanks for looking at it.

-- Steve


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-06-08  6:23 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-20 14:20 [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Steven Rostedt
2019-05-20 14:20 ` [RFC][PATCH 01/14 v2] function_graph: Convert ret_stack to a series of longs Steven Rostedt
2019-05-24 11:11   ` Peter Zijlstra
2019-05-24 12:05     ` Steven Rostedt
2019-06-03 11:30       ` Masami Hiramatsu
2019-06-04  9:04         ` Steven Rostedt
2019-05-20 14:20 ` [RFC][PATCH 02/14 v2] function_graph: Add an array structure that will allow multiple callbacks Steven Rostedt
2019-05-20 14:20 ` [RFC][PATCH 03/14 v2] function_graph: Allow multiple users to attach to function graph Steven Rostedt
2019-05-24 11:26   ` Peter Zijlstra
2019-05-24 12:12     ` Steven Rostedt
2019-05-24 12:27       ` Peter Zijlstra
2019-05-24 12:57         ` Steven Rostedt
2019-05-27 10:10           ` Peter Zijlstra
2019-05-27 11:08             ` Steven Rostedt
2019-05-20 14:20 ` [RFC][PATCH 04/14 v2] function_graph: Remove logic around ftrace_graph_entry and return Steven Rostedt
2019-05-20 14:20 ` [RFC][PATCH 05/14 v2] ftrace/function_graph: Pass fgraph_ops to function graph callbacks Steven Rostedt
2019-05-20 14:20 ` [RFC][PATCH 06/14 v2] ftrace: Allow function_graph tracer to be enabled in instances Steven Rostedt
2019-05-20 14:20 ` [RFC][PATCH 07/14 v2] ftrace: Allow ftrace startup flags exist without dynamic ftrace Steven Rostedt
2019-05-20 14:20 ` [RFC][PATCH 08/14 v2] function_graph: Have the instances use their own ftrace_ops for filtering Steven Rostedt
2019-05-20 14:20 ` [RFC][PATCH 09/14 v2] function_graph: Add "task variables" per task for fgraph_ops Steven Rostedt
2019-05-20 14:20 ` [RFC][PATCH 10/14 v2] function_graph: Move set_graph_function tests to shadow stack global var Steven Rostedt
2019-05-20 14:20 ` [RFC][PATCH 11/14 v2] function_graph: Move graph depth stored data " Steven Rostedt
2019-05-20 14:20 ` [RFC][PATCH 12/14 v2] function_graph: Move graph notrace bit " Steven Rostedt
2019-05-20 14:20 ` [RFC][PATCH 13/14 v2] function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data() Steven Rostedt
2019-05-20 14:20 ` [RFC][PATCH 14/14 v2] function_graph: Add selftest for passing local variables Steven Rostedt
2019-05-22 14:19 ` [RFC][PATCH 00/14 v2] function_graph: Rewrite to allow multiple users Masami Hiramatsu
2019-05-22 14:40   ` Steven Rostedt
2019-05-29  6:47     ` Masami Hiramatsu
2019-05-29  9:25       ` Steven Rostedt
2019-05-30  9:29         ` Masami Hiramatsu
2019-06-08  6:23           ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).