LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Kairui Song <kasong@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
lkml <linux-kernel@vger.kernel.org>,
Kernel Team <Kernel-team@fb.com>,
"Josh Poimboeuf" <jpoimboe@redhat.com>,
Alexei Starovoitov <ast@kernel.org>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: Getting empty callchain from perf_callchain_kernel()
Date: Mon, 20 May 2019 17:16:07 +0000 [thread overview]
Message-ID: <366B9124-728E-4985-9649-FDA64CDDF1FB@fb.com> (raw)
In-Reply-To: <CACPcB9cpNp5CBqoRs+XMCwufzAFa8Pj-gbmj9fb+g5wVdue=ig@mail.gmail.com>
> On May 19, 2019, at 11:06 AM, Kairui Song <kasong@redhat.com> wrote:
>
> On Fri, May 17, 2019 at 5:10 PM Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> On Fri, May 17, 2019 at 04:15:39PM +0800, Kairui Song wrote:
>>> Hi, I think the actual problem is that bpf_get_stackid_tp (and maybe
>>> some other bfp functions) is now broken, or, strating an unwind
>>> directly inside a bpf program will end up strangely. It have following
>>> kernel message:
>>
>> Urgh, what is that bpf_get_stackid_tp() doing to get the regs? I can't
>> follow.
>
> bpf_get_stackid_tp will just use the regs passed to it from the trace
> point. And then it will eventually call perf_get_callchain to get the
> call chain.
> With a tracepoint we have the fake regs, so unwinder will start from
> where it is called, and use the fake regs as the indicator of the
> target frame it want, and keep unwinding until reached the actually
> callsite.
>
> But if the stack trace is started withing a bpf func call then it's broken...
>
> If the unwinder could trace back through the bpf func call then there
> will be no such problem.
>
> For frame pointer unwinder, set the indicator flag (X86_EFLAGS_FIXED)
> before bpf call, and ensure bp is also dumped could fix it (so it will
> start using the regs for bpf calls, like before the commit
> d15d356887e7). But for ORC I don't see a clear way to fix the problem.
> First though is maybe dump some callee's regs for ORC (IP, BP, SP, DI,
> DX, R10, R13, else?) in the trace point handler, then use the flag to
> indicate ORC to do one more unwind (because can't get caller's regs,
> so get callee's regs instaed) before actually giving output?
>
> I had a try, for framepointer unwinder, mark the indicator flag before
> calling bpf functions, and dump bp as well in the trace point. Then
> with frame pointer, it works, test passed:
>
> diff --git a/arch/x86/include/asm/perf_event.h
> b/arch/x86/include/asm/perf_event.h
> index 1392d5e6e8d6..6f1192e9776b 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -302,12 +302,25 @@ extern unsigned long perf_misc_flags(struct
> pt_regs *regs);
>
> #include <asm/stacktrace.h>
>
> +#ifdef CONFIG_FRAME_POINTER
> +static inline unsigned long caller_frame_pointer(void)
> +{
> + return (unsigned long)__builtin_frame_address(1);
> +}
> +#else
> +static inline unsigned long caller_frame_pointer(void)
> +{
> + return 0;
> +}
> +#endif
> +
> /*
> * We abuse bit 3 from flags to pass exact information, see perf_misc_flags
> * and the comment with PERF_EFLAGS_EXACT.
> */
> #define perf_arch_fetch_caller_regs(regs, __ip) { \
> (regs)->ip = (__ip); \
> + (regs)->bp = caller_frame_pointer(); \
> (regs)->sp = (unsigned long)__builtin_frame_address(0); \
> (regs)->cs = __KERNEL_CS; \
> regs->flags = 0; \
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index abbd4b3b96c2..ca7b95ee74f0 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8549,6 +8549,7 @@ void perf_trace_run_bpf_submit(void *raw_data,
> int size, int rctx,
> struct task_struct *task)
> {
> if (bpf_prog_array_valid(call)) {
> + regs->flags |= X86_EFLAGS_FIXED;
> *(struct pt_regs **)raw_data = regs;
> if (!trace_call_bpf(call, raw_data) || hlist_empty(head)) {
> perf_swevent_put_recursion_context(rctx);
> @@ -8822,6 +8823,8 @@ static void bpf_overflow_handler(struct perf_event *event,
> int ret = 0;
>
> ctx.regs = perf_arch_bpf_user_pt_regs(regs);
> + ctx.regs->flags |= X86_EFLAGS_FIXED;
> +
> preempt_disable();
> if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
> goto out;
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index f92d6ad5e080..e1fa656677dc 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -497,6 +497,8 @@ u64 bpf_event_output(struct bpf_map *map, u64
> flags, void *meta, u64 meta_size,
> };
>
> perf_fetch_caller_regs(regs);
> + regs->flags |= X86_EFLAGS_FIXED;
> +
> perf_sample_data_init(sd, 0, 0);
> sd->raw = &raw;
>
> @@ -831,6 +833,8 @@ BPF_CALL_5(bpf_perf_event_output_raw_tp, struct
> bpf_raw_tracepoint_args *, args,
> struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
>
> perf_fetch_caller_regs(regs);
> + regs->flags |= X86_EFLAGS_FIXED;
> +
> return ____bpf_perf_event_output(regs, map, flags, data, size);
> }
>
> @@ -851,6 +855,8 @@ BPF_CALL_3(bpf_get_stackid_raw_tp, struct
> bpf_raw_tracepoint_args *, args,
> struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
>
> perf_fetch_caller_regs(regs);
> + regs->flags |= X86_EFLAGS_FIXED;
> +
> /* similar to bpf_perf_event_output_tp, but pt_regs fetched
> differently */
> return bpf_get_stackid((unsigned long) regs, (unsigned long) map,
> flags, 0, 0);
> @@ -871,6 +877,8 @@ BPF_CALL_4(bpf_get_stack_raw_tp, struct
> bpf_raw_tracepoint_args *, args,
> struct pt_regs *regs = this_cpu_ptr(&bpf_raw_tp_regs);
>
> perf_fetch_caller_regs(regs);
> + regs->flags |= X86_EFLAGS_FIXED;
> +
> return bpf_get_stack((unsigned long) regs, (unsigned long) buf,
> (unsigned long) size, flags, 0);
> }
>
> And *_raw_tp functions will fetch the regs by themselves so a bit
> trouble some...
>
> ----------
>
> And another approach is to make unwinder direct unwinding works when
> called by bpf (if possible and reasonable). I also had a nasty hacky
> experiment (posted below) to just force frame pointer unwinder's
> get_stack_info pass for bpf, then problem is fixed without any other
> workaround:
>
> diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
> index 753b8cfe8b8a..c0cfdf25f5ed 100644
> --- a/arch/x86/kernel/dumpstack_64.c
> +++ b/arch/x86/kernel/dumpstack_64.c
> @@ -166,7 +166,8 @@ int get_stack_info(unsigned long *stack, struct
> task_struct *task,
> if (in_entry_stack(stack, info))
> goto recursion_check;
>
> - goto unknown;
> + goto recursion_check;
>
> recursion_check:
> /*
>
> Don't know how to do it the right way, or is it even possible for all
> unwinders yet...
>
> --
> Best Regards,
> Kairui Song
next prev parent reply other threads:[~2019-05-20 17:17 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-16 23:51 Song Liu
2019-05-17 7:46 ` Peter Zijlstra
2019-05-17 8:10 ` Peter Zijlstra
2019-05-17 8:15 ` Kairui Song
2019-05-17 8:32 ` Kairui Song
2019-05-17 16:22 ` Song Liu
2019-05-17 9:10 ` Peter Zijlstra
2019-05-17 18:40 ` Song Liu
2019-05-17 21:06 ` Alexei Starovoitov
2019-05-17 21:48 ` Song Liu
2019-05-19 18:07 ` Kairui Song
2019-05-20 17:22 ` Song Liu
2019-05-22 13:51 ` Peter Zijlstra
2019-05-19 18:06 ` Kairui Song
2019-05-20 17:16 ` Song Liu [this message]
2019-05-20 17:19 ` Song Liu
2019-05-22 14:02 ` Peter Zijlstra
2019-05-22 14:49 ` Alexei Starovoitov
2019-05-22 17:45 ` Josh Poimboeuf
2019-05-22 23:46 ` Josh Poimboeuf
2019-05-23 6:48 ` Kairui Song
2019-05-23 8:27 ` Song Liu
2019-05-23 9:11 ` Kairui Song
2019-05-23 13:32 ` Josh Poimboeuf
2019-05-23 14:50 ` Kairui Song
2019-05-23 15:24 ` Josh Poimboeuf
2019-05-23 16:41 ` Kairui Song
2019-05-23 17:27 ` Josh Poimboeuf
2019-05-24 2:20 ` Kairui Song
2019-05-24 23:23 ` Josh Poimboeuf
2019-05-27 11:57 ` Kairui Song
2019-06-06 16:04 ` Song Liu
2019-06-06 23:58 ` Josh Poimboeuf
2019-06-11 21:03 ` Josh Poimboeuf
2019-05-24 8:53 ` Peter Zijlstra
2019-05-24 13:05 ` Josh Poimboeuf
2019-06-12 3:05 ` Josh Poimboeuf
2019-06-12 8:54 ` Peter Zijlstra
2019-06-12 14:50 ` Josh Poimboeuf
2019-06-13 20:26 ` Josh Poimboeuf
2019-06-12 13:10 ` Steven Rostedt
2019-06-12 14:26 ` Josh Poimboeuf
2019-05-22 18:07 ` Josh Poimboeuf
2019-05-22 21:55 ` Alexei Starovoitov
2019-05-17 16:32 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=366B9124-728E-4985-9649-FDA64CDDF1FB@fb.com \
--to=songliubraving@fb.com \
--cc=Kernel-team@fb.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jpoimboe@redhat.com \
--cc=kasong@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--subject='Re: Getting empty callchain from perf_callchain_kernel()' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).