LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [RFC] perf/core: what is exclude_idle supposed to do @ 2018-04-16 22:04 Stephane Eranian 2018-04-17 6:20 ` Jiri Olsa ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Stephane Eranian @ 2018-04-16 22:04 UTC (permalink / raw) To: LKML Cc: Peter Zijlstra, Jiri Olsa, Arnaldo Carvalho de Melo, mingo, Andi Kleen, Vince Weaver Hi, I am trying to understand what the exclude_idle event attribute is supposed to accomplish. As per the definition in the header file: exclude_idle : 1, /* don't count when idle */ Naively, I thought it would simply stop the event when running in the context of the idle task (swapper, pid 0) on any CPU. That would seem to match the succinct description. However, running a simple: $ perf record -a -e cycles:I sleep 5 perf_event_attr: sample_type IP|TID|TIME|CPU|PERIOD read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID exclude_idle 1 on an idle machine, showed me that this is not what is actually happening: $ perf script swapper 0 [000] 1132634.287442: 1 cycles:I: ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms]) swapper 0 [001] 1132634.287498: 1 cycles:I: ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms]) After looking at the code, it all made sense, it seems to current implementation is only relevant for sw events. I don't understand why. I am left wondering what is the goal of exclude_idle? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] perf/core: what is exclude_idle supposed to do 2018-04-16 22:04 [RFC] perf/core: what is exclude_idle supposed to do Stephane Eranian @ 2018-04-17 6:20 ` Jiri Olsa 2018-04-18 15:10 ` Vince Weaver 2018-04-17 13:40 ` Arnaldo Carvalho de Melo 2018-04-20 8:35 ` Peter Zijlstra 2 siblings, 1 reply; 9+ messages in thread From: Jiri Olsa @ 2018-04-17 6:20 UTC (permalink / raw) To: Stephane Eranian Cc: LKML, Peter Zijlstra, Arnaldo Carvalho de Melo, mingo, Andi Kleen, Vince Weaver On Mon, Apr 16, 2018 at 10:04:53PM +0000, Stephane Eranian wrote: > Hi, > > I am trying to understand what the exclude_idle event attribute is supposed > to accomplish. > As per the definition in the header file: > > exclude_idle : 1, /* don't count when idle */ > > Naively, I thought it would simply stop the event when running in the > context of the idle task (swapper, pid 0) on any CPU. That would seem to > match the succinct description. > > However, running a simple: > > $ perf record -a -e cycles:I sleep 5 > perf_event_attr: > sample_type IP|TID|TIME|CPU|PERIOD > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID > exclude_idle 1 > > on an idle machine, showed me that this is not what is actually happening: > $ perf script > swapper 0 [000] 1132634.287442: 1 cycles:I: > ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms]) > swapper 0 [001] 1132634.287498: 1 cycles:I: > ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms]) > > > After looking at the code, it all made sense, it seems to current > implementation is only relevant for sw events. I don't understand why. AFAICS it's not implemented Peter suggested some time ago change for cpu-clock (attached) I still have it in my queue, because it gives funny numbers sometimes.. and I haven't figured it out so far the idea so far was to use cpu-clock,cpu-clock:I events to count and display idle % in perf top/record and stat metrics jirka --- >From 7f20b047cf56f8086d79007175592633798656ce Mon Sep 17 00:00:00 2001 From: Peter Zijlstra <peterz@infradead.org> Date: Thu, 23 Nov 2017 16:15:36 +0100 Subject: [PATCH] perf: Add support to monitor idle time on cpu-clock Adding support to use perf_event_attr::exclude_idle (the :I modifierr in perf tools)to monitor idle time on cpu-clock event. Link: http://lkml.kernel.org/n/tip-jw45kl6ydrhzqx4uxws0qsqb@git.kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org> --- kernel/events/core.c | 22 ++++++++++++++++++---- kernel/sched/idle_task.c | 17 +++++++++++++++++ 2 files changed, 35 insertions(+), 4 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 2989e3b0fe8b..62c0dc75ed11 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9391,19 +9391,33 @@ static void perf_swevent_init_hrtimer(struct perf_event *event) * Software event: cpu wall time clock */ +static u64 cpu_clock_count(struct perf_event *event) +{ + u64 now = local_clock(); + + if (event->attr.exclude_idle) + now -= idle_task(event->oncpu)->se.sum_exec_runtime; + + return now; +} + static void cpu_clock_event_update(struct perf_event *event) { - s64 prev; + s64 prev, delta; u64 now; - now = local_clock(); + now = cpu_clock_count(event); prev = local64_xchg(&event->hw.prev_count, now); - local64_add(now - prev, &event->count); + delta = now - prev; + if (delta > 0) + local64_add(delta, &event->count); } static void cpu_clock_event_start(struct perf_event *event, int flags) { - local64_set(&event->hw.prev_count, local_clock()); + u64 now = cpu_clock_count(event); + + local64_set(&event->hw.prev_count, now); perf_swevent_start_hrtimer(event); } diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index d518664cce4f..e360ce5dd9a2 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -27,9 +27,14 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl static struct task_struct * pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) { + struct task_struct *idle = rq->idle; + put_prev_task(rq, prev); update_idle_core(rq); schedstat_inc(rq->sched_goidle); + + idle->se.exec_start = rq_clock_task(rq); + return rq->idle; } @@ -48,6 +53,15 @@ dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags) static void put_prev_task_idle(struct rq *rq, struct task_struct *prev) { + struct task_struct *idle = rq->idle; + u64 delta, now; + + now = rq_clock_task(rq); + delta = now - idle->se.exec_start; + if (likely((s64)delta > 0)) + idle->se.sum_exec_runtime += delta; + idle->se.exec_start = now; + rq_last_tick_reset(rq); } @@ -57,6 +71,9 @@ static void task_tick_idle(struct rq *rq, struct task_struct *curr, int queued) static void set_curr_task_idle(struct rq *rq) { + struct task_struct *idle = rq->idle; + + idle->se.exec_start = rq_clock_task(rq); } static void switched_to_idle(struct rq *rq, struct task_struct *p) -- 2.13.6 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] perf/core: what is exclude_idle supposed to do 2018-04-17 6:20 ` Jiri Olsa @ 2018-04-18 15:10 ` Vince Weaver 2018-04-20 8:36 ` Peter Zijlstra 0 siblings, 1 reply; 9+ messages in thread From: Vince Weaver @ 2018-04-18 15:10 UTC (permalink / raw) To: Jiri Olsa Cc: Stephane Eranian, LKML, Peter Zijlstra, Arnaldo Carvalho de Melo, mingo, Andi Kleen, Vince Weaver On Tue, 17 Apr 2018, Jiri Olsa wrote: > On Mon, Apr 16, 2018 at 10:04:53PM +0000, Stephane Eranian wrote: > > Hi, > > > > I am trying to understand what the exclude_idle event attribute is supposed > > to accomplish. > > As per the definition in the header file: > > > > exclude_idle : 1, /* don't count when idle */ > > AFAICS it's not implemented so just to be completely clear hear, we're saying that the "exclude_idle" modifier has never done anything useful and still doesn't? If so I should update the perf_event_open manpage to spell this out. Vince ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] perf/core: what is exclude_idle supposed to do 2018-04-18 15:10 ` Vince Weaver @ 2018-04-20 8:36 ` Peter Zijlstra 2018-04-20 14:18 ` Vince Weaver 0 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2018-04-20 8:36 UTC (permalink / raw) To: Vince Weaver Cc: Jiri Olsa, Stephane Eranian, LKML, Arnaldo Carvalho de Melo, mingo, Andi Kleen On Wed, Apr 18, 2018 at 11:10:20AM -0400, Vince Weaver wrote: > On Tue, 17 Apr 2018, Jiri Olsa wrote: > > > On Mon, Apr 16, 2018 at 10:04:53PM +0000, Stephane Eranian wrote: > > > Hi, > > > > > > I am trying to understand what the exclude_idle event attribute is supposed > > > to accomplish. > > > As per the definition in the header file: > > > > > > exclude_idle : 1, /* don't count when idle */ > > > > AFAICS it's not implemented > > so just to be completely clear hear, we're saying that the "exclude_idle" > modifier has never done anything useful and still doesn't? AFAICT it works on Power and possibly ARM. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] perf/core: what is exclude_idle supposed to do 2018-04-20 8:36 ` Peter Zijlstra @ 2018-04-20 14:18 ` Vince Weaver 2018-04-20 16:51 ` Vince Weaver 0 siblings, 1 reply; 9+ messages in thread From: Vince Weaver @ 2018-04-20 14:18 UTC (permalink / raw) To: Peter Zijlstra Cc: Vince Weaver, Jiri Olsa, Stephane Eranian, LKML, Arnaldo Carvalho de Melo, mingo, Andi Kleen On Fri, 20 Apr 2018, Peter Zijlstra wrote: > On Wed, Apr 18, 2018 at 11:10:20AM -0400, Vince Weaver wrote: > > On Tue, 17 Apr 2018, Jiri Olsa wrote: > > > > > On Mon, Apr 16, 2018 at 10:04:53PM +0000, Stephane Eranian wrote: > > > > Hi, > > > > > > > > I am trying to understand what the exclude_idle event attribute is supposed > > > > to accomplish. > > > > As per the definition in the header file: > > > > > > > > exclude_idle : 1, /* don't count when idle */ > > > > > > AFAICS it's not implemented > > > > so just to be completely clear hear, we're saying that the "exclude_idle" > > modifier has never done anything useful and still doesn't? > > AFAICT it works on Power and possibly ARM. at least some ARMs are a bit more honest about it than x86 ivybridge: Performance counter stats for '/bin/ls': 1,368,162 instructions 1,368,162 instructions:I pi2/ARM cortex-A7 Performance counter stats for '/bin/ls': 1,910,083 instructions <not supported> instructions:I I'd fire up my Power8 machine to see but not sure it's worth the hassle and/or having to get out the ear protection. Vince ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] perf/core: what is exclude_idle supposed to do 2018-04-20 14:18 ` Vince Weaver @ 2018-04-20 16:51 ` Vince Weaver 2018-04-20 18:19 ` Stephane Eranian 0 siblings, 1 reply; 9+ messages in thread From: Vince Weaver @ 2018-04-20 16:51 UTC (permalink / raw) To: Vince Weaver Cc: Peter Zijlstra, Jiri Olsa, Stephane Eranian, LKML, Arnaldo Carvalho de Melo, mingo, Andi Kleen On Fri, 20 Apr 2018, Vince Weaver wrote: > > AFAICT it works on Power and possibly ARM. > > at least some ARMs are a bit more honest about it than x86 > > ivybridge: > Performance counter stats for '/bin/ls': > 1,368,162 instructions > 1,368,162 instructions:I > > pi2/ARM cortex-A7 > Performance counter stats for '/bin/ls': > 1,910,083 instructions > <not supported> instructions:I > > I'd fire up my Power8 machine to see but not sure it's worth the hassle > and/or having to get out the ear protection. I did power up the Power8 machine in the end: power8: perf stat -e cycles,cycles:I sleep 5 Performance counter stats for 'sleep 5': 14,271,273 cycles 14,271,273 cycles:I ??? But then if I try again on power8 perf stat -a -e cycles,cycles:I sleep 5 Performance counter stats for 'system wide': 1,238,772,322,327 cycles 1,238,674,771,713 cycles:I there is a difference. But then on ivybridge perf stat -a -e cycles,cycles:I sleep 5 Performance counter stats for 'system wide': 589,598,104 cycles 589,537,190 cycles:I there is also a different in system wide mode. So maybe exclude_idle does do something on x86? Or am I completely misunderstanding what the flag is supposed to be indicating? Vince ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] perf/core: what is exclude_idle supposed to do 2018-04-20 16:51 ` Vince Weaver @ 2018-04-20 18:19 ` Stephane Eranian 0 siblings, 0 replies; 9+ messages in thread From: Stephane Eranian @ 2018-04-20 18:19 UTC (permalink / raw) To: Vince Weaver Cc: Peter Zijlstra, Jiri Olsa, LKML, Arnaldo Carvalho de Melo, mingo, Andi Kleen On Fri, Apr 20, 2018 at 9:51 AM Vince Weaver <vincent.weaver@maine.edu> wrote: > On Fri, 20 Apr 2018, Vince Weaver wrote: > > > AFAICT it works on Power and possibly ARM. > > > > at least some ARMs are a bit more honest about it than x86 > > > > ivybridge: > > Performance counter stats for '/bin/ls': > > 1,368,162 instructions > > 1,368,162 instructions:I > > > > pi2/ARM cortex-A7 > > Performance counter stats for '/bin/ls': > > 1,910,083 instructions > > <not supported> instructions:I > > > > I'd fire up my Power8 machine to see but not sure it's worth the hassle > > and/or having to get out the ear protection. > I did power up the Power8 machine in the end: > power8: > perf stat -e cycles,cycles:I sleep 5 > Performance counter stats for 'sleep 5': > 14,271,273 cycles > 14,271,273 cycles:I > ??? > But then if I try again on power8 > perf stat -a -e cycles,cycles:I sleep 5 > Performance counter stats for 'system wide': > 1,238,772,322,327 cycles > 1,238,674,771,713 cycles:I > there is a difference. > But then on ivybridge > perf stat -a -e cycles,cycles:I sleep 5 > Performance counter stats for 'system wide': > 589,598,104 cycles > 589,537,190 cycles:I This may be noise. The way the flag is named leads me to believe its goal is to not count when executing in the context of the idle task (pid 0 on each CPU). However, it does not seem to be implemented that way. If you were to implement this, then in system wide mode you'd have to check the incoming task on ctxsw, very much like we do in cgroup monitoring. So it would not be totally free. One can argue, in sampling mode you can eliminate the samples coming from PID=0 in the tool. But there would be nothing to cover counting mode. Interestingly, there is also code in perf tool to exclude known idle routines from reporting. But this is targeted to only some routines that the idle task may end up executing. so it is not quite the same. > So maybe exclude_idle does do something on x86? Or am I completely > misunderstanding what the flag is supposed to be indicating? > Vince ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] perf/core: what is exclude_idle supposed to do 2018-04-16 22:04 [RFC] perf/core: what is exclude_idle supposed to do Stephane Eranian 2018-04-17 6:20 ` Jiri Olsa @ 2018-04-17 13:40 ` Arnaldo Carvalho de Melo 2018-04-20 8:35 ` Peter Zijlstra 2 siblings, 0 replies; 9+ messages in thread From: Arnaldo Carvalho de Melo @ 2018-04-17 13:40 UTC (permalink / raw) To: Stephane Eranian Cc: LKML, Peter Zijlstra, Jiri Olsa, Ingo Molnar, Andi Kleen, Vince Weaver, acme Em Mon, Apr 16, 2018 at 10:04:53PM +0000, Stephane Eranian escreveu: > Hi, > > I am trying to understand what the exclude_idle event attribute is supposed > to accomplish. > As per the definition in the header file: > > exclude_idle : 1, /* don't count when idle */ > > Naively, I thought it would simply stop the event when running in the > context of the idle task (swapper, pid 0) on any CPU. That would seem to > match the succinct description. > > However, running a simple: > > $ perf record -a -e cycles:I sleep 5 > perf_event_attr: > sample_type IP|TID|TIME|CPU|PERIOD > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID > exclude_idle 1 > > on an idle machine, showed me that this is not what is actually happening: > $ perf script > swapper 0 [000] 1132634.287442: 1 cycles:I: > ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms]) > swapper 0 [001] 1132634.287498: 1 cycles:I: > ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms]) > > > After looking at the code, it all made sense, it seems to current > implementation is only relevant for sw events. I don't understand why. > > I am left wondering what is the goal of exclude_idle? To do something it is not doing, i.e. to do what you expected it to do. There were messages exchanged recently where PeterZ said that this is just that needs fixing. - Arnaldo ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] perf/core: what is exclude_idle supposed to do 2018-04-16 22:04 [RFC] perf/core: what is exclude_idle supposed to do Stephane Eranian 2018-04-17 6:20 ` Jiri Olsa 2018-04-17 13:40 ` Arnaldo Carvalho de Melo @ 2018-04-20 8:35 ` Peter Zijlstra 2 siblings, 0 replies; 9+ messages in thread From: Peter Zijlstra @ 2018-04-20 8:35 UTC (permalink / raw) To: Stephane Eranian Cc: LKML, Jiri Olsa, Arnaldo Carvalho de Melo, mingo, Andi Kleen, Vince Weaver On Mon, Apr 16, 2018 at 10:04:53PM +0000, Stephane Eranian wrote: > Hi, > > I am trying to understand what the exclude_idle event attribute is supposed > to accomplish. > As per the definition in the header file: > > exclude_idle : 1, /* don't count when idle */ > > Naively, I thought it would simply stop the event when running in the > context of the idle task (swapper, pid 0) on any CPU. That would seem to > match the succinct description. > > However, running a simple: > > $ perf record -a -e cycles:I sleep 5 > perf_event_attr: > sample_type IP|TID|TIME|CPU|PERIOD > read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID > exclude_idle 1 > > on an idle machine, showed me that this is not what is actually happening: > $ perf script > swapper 0 [000] 1132634.287442: 1 cycles:I: > ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms]) > swapper 0 [001] 1132634.287498: 1 cycles:I: > ffffffff8100b1fb __intel_pmu_enable_all.isra.17 ([kernel.kallsyms]) > > > After looking at the code, it all made sense, it seems to current > implementation is only relevant for sw events. I don't understand why. > > I am left wondering what is the goal of exclude_idle? A "git grep exclude_idle" seems to suggest powerpc/arm have it immplemented for their PMU. If we then look at commit: 2743a5b0fa6f ("perfcounters: provide expansion room in the ABI") It was Paul who introduced the bit. So I'm thinking that if x86 doesn't implement it, we should at least error out on it. Of course, so far we've allowed it, so who knows what all will break if we start asserting that :/ ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-04-20 18:19 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-04-16 22:04 [RFC] perf/core: what is exclude_idle supposed to do Stephane Eranian 2018-04-17 6:20 ` Jiri Olsa 2018-04-18 15:10 ` Vince Weaver 2018-04-20 8:36 ` Peter Zijlstra 2018-04-20 14:18 ` Vince Weaver 2018-04-20 16:51 ` Vince Weaver 2018-04-20 18:19 ` Stephane Eranian 2018-04-17 13:40 ` Arnaldo Carvalho de Melo 2018-04-20 8:35 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).