LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Song Liu <songliubraving@fb.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "open list:BPF (Safe dynamic programs and tools)" 
	<bpf@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"acme@kernel.org" <acme@kernel.org>,
	"mingo@redhat.com" <mingo@redhat.com>,
	Kernel Team <Kernel-team@fb.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	"Like Xu" <like.xu@linux.intel.com>,
	Alexey Budankov <alexey.budankov@linux.intel.com>
Subject: Re: [RFC] bpf: lbr: enable reading LBR from tracing bpf programs
Date: Thu, 19 Aug 2021 18:22:07 +0000	[thread overview]
Message-ID: <A5F7CF90-27F9-476C-B87C-CAD2A6BE5DA4@fb.com> (raw)
In-Reply-To: <YR6dreGQSe4oQFBr@hirez.programming.kicks-ass.net>

Hi Peter, 

> On Aug 19, 2021, at 11:06 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> 
> On Thu, Aug 19, 2021 at 04:46:20PM +0000, Song Liu wrote:
>>> void perf_inject_event(struct perf_event *event, struct pt_regs *regs)
>>> {
>>> 	struct perf_sample_data data;
>>> 	struct pmu *pmu = event->pmu;
>>> 	unsigned long flags;
>>> 
>>> 	local_irq_save(flags);
>>> 	perf_pmu_disable(pmu);
>>> 
>>> 	perf_sample_data_init(&data, 0, 0);
>>> 	/*
>>> 	 * XXX or a variant with more _ that starts at the overflow
>>> 	 * handler...
>>> 	 */
>>> 	__perf_event_overflow(event, 0, &data, regs);
>>> 
>>> 	perf_pmu_enable(pmu);
>>> 	local_irq_restore(flags);
>>> }
>>> 
>>> But please consider carefully, I haven't...
>> 
>> Hmm... This is a little weird to me. 
>> IIUC, we need to call perf_inject_event() after the software event, say
>> a kretprobe, triggers. So it gonna look like:
>> 
>>  1. kretprobe trigger;
>>  2. handler calls perf_inject_event();
>>  3. PMI kicks in, and saves LBR;
> 
> This doesn't actually happen. I overlooked the fact that we need the PMI
> to fill out @data for us.
> 
>>  4. after the PMI, consumer of LBR uses the saved data;
> 
> Normal overflow handler will have data->br_stack set, but I now realize
> that the 'psuedo' code above will not get that. We need to somehow get
> the arch bits involved; again :/
> 
>> However, given perf_inject_event() disables PMU, we can just save the LBR
>> right there? And it should be a lot easier? Something like:
>> 
>>  1. kretprobe triggers;
>>  2. handler calls perf_snapshot_lbr();
>>     2.1 perf_pmu_disable(pmu);
>>     2.2 saves LBR 
>>     2.3 perf_pmu_enable(pmu);
>>  3. consumer of LBR uses the saved data;
>> 
>> What is the downside of this approach? 
> 
> It would be perf_snapshot_branch_stack() and would require a new
> (optional) pmu::method to set up the branch stack.

I guess it would look like:

diff --git i/include/linux/perf_event.h w/include/linux/perf_event.h
index fe156a8170aa3..af379b7f18050 100644
--- i/include/linux/perf_event.h
+++ w/include/linux/perf_event.h
@@ -514,6 +514,9 @@ struct pmu {
         * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
         */
        int (*check_period)             (struct perf_event *event, u64 value); /* optional */
+
+       int (*snapshot_branch_stack)    (struct perf_event *event, /* TBD, maybe struct
+                                                                     perf_output_handle? */);
 };

 enum perf_addr_filter_action_t {
diff --git i/kernel/events/core.c w/kernel/events/core.c
index 2d1e63dd97f23..14aa5f7bccf1f 100644
--- i/kernel/events/core.c
+++ w/kernel/events/core.c
@@ -1207,6 +1207,19 @@ void perf_pmu_enable(struct pmu *pmu)
                pmu->pmu_enable(pmu);
 }

+int perf_snapshot_branch_stack(struct perf_event *event)
+{
+       struct pmu *pmu = event->pmu;
+       int ret;
+
+       if (!pmu->snapshot_branch_stack)
+               return -EOPNOTSUPP;
+       perf_pmu_disable(pmu);
+       ret = pmu->snapshot_branch_stack(event, ...);
+       perf_pmu_enable(pmu);
+       return 0;
+}
+
 static DEFINE_PER_CPU(struct list_head, active_ctx_list);

> 
> And if we're going to be adding new pmu::methods then I figure one that
> does the whole sample state might be more useful.

What do you mean by "whole sample state"? To integrate with exiting
perf_sample_data, like perf_output_sample()?

Thanks,
Song

  reply	other threads:[~2021-08-19 18:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-18  1:29 Song Liu
2021-08-18  9:15 ` Peter Zijlstra
2021-08-18 16:46   ` Song Liu
2021-08-19 11:57     ` Peter Zijlstra
2021-08-19 16:46       ` Song Liu
2021-08-19 18:06         ` Peter Zijlstra
2021-08-19 18:22           ` Song Liu [this message]
2021-08-19 18:27             ` Peter Zijlstra
2021-08-19 18:45               ` Song Liu
2021-08-20  7:33               ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A5F7CF90-27F9-476C-B87C-CAD2A6BE5DA4@fb.com \
    --to=songliubraving@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=acme@kernel.org \
    --cc=alexey.budankov@linux.intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=like.xu@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --subject='Re: [RFC] bpf: lbr: enable reading LBR from tracing bpf programs' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).