LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [RFC 0/3] preempt_tracer: Fix preempt_disable tracepoint @ 2019-05-28 15:16 Daniel Bristot de Oliveira 2019-05-28 15:16 ` [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption Daniel Bristot de Oliveira ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Daniel Bristot de Oliveira @ 2019-05-28 15:16 UTC (permalink / raw) To: linux-kernel Cc: williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta While playing with the model + working in a fix for the task context & trace recursion, I ended up hitting two cases in which the preempt_disable/enable tracepoint was supposed to happen, but didn't. There is an explanation for each case in the log message. This is an RFC exposing the problem, with possible ways to fix it. But I bet there might be better solutions as well, so this is a real RFC. Daniel Bristot de Oliveira (3): softirq: Use preempt_latency_stop/start to trace preemption preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change preempt_tracer: Use a percpu variable to control traceble calls kernel/sched/core.c | 66 ++++++++++++++++++++++++++++++++++----------- kernel/softirq.c | 13 ++++----- 2 files changed, 55 insertions(+), 24 deletions(-) -- 2.20.1 ^ permalink raw reply [flat|nested] 27+ messages in thread
* [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption 2019-05-28 15:16 [RFC 0/3] preempt_tracer: Fix preempt_disable tracepoint Daniel Bristot de Oliveira @ 2019-05-28 15:16 ` Daniel Bristot de Oliveira 2019-05-29 8:29 ` Peter Zijlstra 2019-05-29 9:30 ` Joel Fernandes 2019-05-28 15:16 ` [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change Daniel Bristot de Oliveira 2019-05-28 15:16 ` [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls Daniel Bristot de Oliveira 2 siblings, 2 replies; 27+ messages in thread From: Daniel Bristot de Oliveira @ 2019-05-28 15:16 UTC (permalink / raw) To: linux-kernel Cc: williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta prempt_disable/enable tracepoints occurs only in the preemption enabled <-> disable transition. As preempt_latency_stop() and preempt_latency_start() already do this control, avoid code duplication by using these functions in the softirq code as well. RFC: Should we move preempt_latency_start/preempt_latency_stop to a trace source file... or even a header? Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Yangtao Li <tiny.windzz@gmail.com> Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it> Cc: linux-kernel@vger.kernel.org --- kernel/sched/core.c | 4 ++-- kernel/softirq.c | 13 +++++-------- 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 874c427742a9..8c0b414e45dc 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3152,7 +3152,7 @@ static inline void sched_tick_stop(int cpu) { } * If the value passed in is equal to the current preempt count * then we just disabled preemption. Start timing the latency. */ -static inline void preempt_latency_start(int val) +void preempt_latency_start(int val) { if (preempt_count() == val) { unsigned long ip = get_lock_parent_ip(); @@ -3189,7 +3189,7 @@ NOKPROBE_SYMBOL(preempt_count_add); * If the value passed in equals to the current preempt count * then we just enabled preemption. Stop timing the latency. */ -static inline void preempt_latency_stop(int val) +void preempt_latency_stop(int val) { if (preempt_count() == val) trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip()); diff --git a/kernel/softirq.c b/kernel/softirq.c index 2c3382378d94..c9ad89c3dfed 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -108,6 +108,8 @@ static bool ksoftirqd_running(unsigned long pending) * where hardirqs are disabled legitimately: */ #ifdef CONFIG_TRACE_IRQFLAGS +extern void preempt_latency_start(int val); +extern void preempt_latency_stop(int val); void __local_bh_disable_ip(unsigned long ip, unsigned int cnt) { unsigned long flags; @@ -130,12 +132,8 @@ void __local_bh_disable_ip(unsigned long ip, unsigned int cnt) trace_softirqs_off(ip); raw_local_irq_restore(flags); - if (preempt_count() == cnt) { -#ifdef CONFIG_DEBUG_PREEMPT - current->preempt_disable_ip = get_lock_parent_ip(); -#endif - trace_preempt_off(CALLER_ADDR0, get_lock_parent_ip()); - } + preempt_latency_start(cnt); + } EXPORT_SYMBOL(__local_bh_disable_ip); #endif /* CONFIG_TRACE_IRQFLAGS */ @@ -144,8 +142,7 @@ static void __local_bh_enable(unsigned int cnt) { lockdep_assert_irqs_disabled(); - if (preempt_count() == cnt) - trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip()); + preempt_latency_stop(cnt); if (softirq_count() == (cnt & SOFTIRQ_MASK)) trace_softirqs_on(_RET_IP_); -- 2.20.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption 2019-05-28 15:16 ` [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption Daniel Bristot de Oliveira @ 2019-05-29 8:29 ` Peter Zijlstra 2019-05-29 9:30 ` Joel Fernandes 1 sibling, 0 replies; 27+ messages in thread From: Peter Zijlstra @ 2019-05-29 8:29 UTC (permalink / raw) To: Daniel Bristot de Oliveira Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Tue, May 28, 2019 at 05:16:22PM +0200, Daniel Bristot de Oliveira wrote: > prempt_disable/enable tracepoints occurs only in the preemption > enabled <-> disable transition. As preempt_latency_stop() and > preempt_latency_start() already do this control, avoid code > duplication by using these functions in the softirq code as well. > > RFC: Should we move preempt_latency_start/preempt_latency_stop > to a trace source file... or even a header? Yeah, a header might not be a bad idea. > @@ -130,12 +132,8 @@ void __local_bh_disable_ip(unsigned long ip, unsigned int cnt) > trace_softirqs_off(ip); > raw_local_irq_restore(flags); > > - if (preempt_count() == cnt) { > -#ifdef CONFIG_DEBUG_PREEMPT > - current->preempt_disable_ip = get_lock_parent_ip(); > -#endif > - trace_preempt_off(CALLER_ADDR0, get_lock_parent_ip()); > - } > + preempt_latency_start(cnt); > + I'm thinking we can do without that empty line. > } > EXPORT_SYMBOL(__local_bh_disable_ip); > #endif /* CONFIG_TRACE_IRQFLAGS */ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption 2019-05-28 15:16 ` [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption Daniel Bristot de Oliveira 2019-05-29 8:29 ` Peter Zijlstra @ 2019-05-29 9:30 ` Joel Fernandes 2019-05-29 12:22 ` Steven Rostedt 1 sibling, 1 reply; 27+ messages in thread From: Joel Fernandes @ 2019-05-29 9:30 UTC (permalink / raw) To: Daniel Bristot de Oliveira Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Tue, May 28, 2019 at 05:16:22PM +0200, Daniel Bristot de Oliveira wrote: > prempt_disable/enable tracepoints occurs only in the preemption > enabled <-> disable transition. As preempt_latency_stop() and > preempt_latency_start() already do this control, avoid code > duplication by using these functions in the softirq code as well. > > RFC: Should we move preempt_latency_start/preempt_latency_stop > to a trace source file... or even a header? Yes, I think so. Also this patch changes CALLER_ADDR0 passed to the tracepoint because there's one more level of a non-inlined function call in the call chain right? Very least the changelog should document this change in functional behavior, IMO. Looks like a nice change otherwise, thanks! > Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> > Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> > Cc: Matthias Kaehlcke <mka@chromium.org> > Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org> > Cc: Frederic Weisbecker <frederic@kernel.org> > Cc: Yangtao Li <tiny.windzz@gmail.com> > Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it> > Cc: linux-kernel@vger.kernel.org > --- > kernel/sched/core.c | 4 ++-- > kernel/softirq.c | 13 +++++-------- > 2 files changed, 7 insertions(+), 10 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 874c427742a9..8c0b414e45dc 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3152,7 +3152,7 @@ static inline void sched_tick_stop(int cpu) { } > * If the value passed in is equal to the current preempt count > * then we just disabled preemption. Start timing the latency. > */ > -static inline void preempt_latency_start(int val) > +void preempt_latency_start(int val) > { > if (preempt_count() == val) { > unsigned long ip = get_lock_parent_ip(); > @@ -3189,7 +3189,7 @@ NOKPROBE_SYMBOL(preempt_count_add); > * If the value passed in equals to the current preempt count > * then we just enabled preemption. Stop timing the latency. > */ > -static inline void preempt_latency_stop(int val) > +void preempt_latency_stop(int val) > { > if (preempt_count() == val) > trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip()); > diff --git a/kernel/softirq.c b/kernel/softirq.c > index 2c3382378d94..c9ad89c3dfed 100644 > --- a/kernel/softirq.c > +++ b/kernel/softirq.c > @@ -108,6 +108,8 @@ static bool ksoftirqd_running(unsigned long pending) > * where hardirqs are disabled legitimately: > */ > #ifdef CONFIG_TRACE_IRQFLAGS > +extern void preempt_latency_start(int val); > +extern void preempt_latency_stop(int val); > void __local_bh_disable_ip(unsigned long ip, unsigned int cnt) > { > unsigned long flags; > @@ -130,12 +132,8 @@ void __local_bh_disable_ip(unsigned long ip, unsigned int cnt) > trace_softirqs_off(ip); > raw_local_irq_restore(flags); > > - if (preempt_count() == cnt) { > -#ifdef CONFIG_DEBUG_PREEMPT > - current->preempt_disable_ip = get_lock_parent_ip(); > -#endif > - trace_preempt_off(CALLER_ADDR0, get_lock_parent_ip()); > - } > + preempt_latency_start(cnt); > + > } > EXPORT_SYMBOL(__local_bh_disable_ip); > #endif /* CONFIG_TRACE_IRQFLAGS */ > @@ -144,8 +142,7 @@ static void __local_bh_enable(unsigned int cnt) > { > lockdep_assert_irqs_disabled(); > > - if (preempt_count() == cnt) > - trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip()); > + preempt_latency_stop(cnt); > > if (softirq_count() == (cnt & SOFTIRQ_MASK)) > trace_softirqs_on(_RET_IP_); > -- > 2.20.1 > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption 2019-05-29 9:30 ` Joel Fernandes @ 2019-05-29 12:22 ` Steven Rostedt 2019-06-04 10:39 ` Daniel Bristot de Oliveira 0 siblings, 1 reply; 27+ messages in thread From: Steven Rostedt @ 2019-05-29 12:22 UTC (permalink / raw) To: Joel Fernandes Cc: Daniel Bristot de Oliveira, linux-kernel, williams, daniel, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Wed, 29 May 2019 05:30:56 -0400 Joel Fernandes <joel@joelfernandes.org> wrote: > Yes, I think so. Also this patch changes CALLER_ADDR0 passed to the > tracepoint because there's one more level of a non-inlined function call > in the call chain right? Very least the changelog should document this > change in functional behavior, IMO. This sounds more like a break in behavior not a functional change. I guess moving it to a header and making it a static __always_inline should be fine though. -- Steve ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption 2019-05-29 12:22 ` Steven Rostedt @ 2019-06-04 10:39 ` Daniel Bristot de Oliveira 2019-06-04 12:01 ` Steven Rostedt 0 siblings, 1 reply; 27+ messages in thread From: Daniel Bristot de Oliveira @ 2019-06-04 10:39 UTC (permalink / raw) To: Steven Rostedt, Joel Fernandes Cc: Daniel Bristot de Oliveira, linux-kernel, williams, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On 29/05/2019 14:22, Steven Rostedt wrote: > On Wed, 29 May 2019 05:30:56 -0400 > Joel Fernandes <joel@joelfernandes.org> wrote: > >> Yes, I think so. Also this patch changes CALLER_ADDR0 passed to the >> tracepoint because there's one more level of a non-inlined function call >> in the call chain right? Very least the changelog should document this >> change in functional behavior, IMO. In practice I am seeing no change in the values printed, but there is another problem with this regard: there are cases in which both caller and parent have the same address. I am quite sure it has to do with the in_lock_function() behavior. Anyway, I was already planing to propose a cleanup in the in_lock_function/in_sched_function. I will investigate it more. > This sounds more like a break in behavior not a functional change. I > guess moving it to a header and making it a static __always_inline > should be fine though. Steve, which header should I use? Thanks! -- Daniel > -- Steve > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption 2019-06-04 10:39 ` Daniel Bristot de Oliveira @ 2019-06-04 12:01 ` Steven Rostedt 0 siblings, 0 replies; 27+ messages in thread From: Steven Rostedt @ 2019-06-04 12:01 UTC (permalink / raw) To: Daniel Bristot de Oliveira Cc: Joel Fernandes, linux-kernel, williams, Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Tue, 4 Jun 2019 12:39:03 +0200 Daniel Bristot de Oliveira <bristot@redhat.com> wrote: > > This sounds more like a break in behavior not a functional change. I > > guess moving it to a header and making it a static __always_inline > > should be fine though. > > Steve, which header should I use? I would say include/linux/preempt.h if Peter doesn't have any issues with that. -- Steve ^ permalink raw reply [flat|nested] 27+ messages in thread
* [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-28 15:16 [RFC 0/3] preempt_tracer: Fix preempt_disable tracepoint Daniel Bristot de Oliveira 2019-05-28 15:16 ` [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption Daniel Bristot de Oliveira @ 2019-05-28 15:16 ` Daniel Bristot de Oliveira 2019-05-29 8:33 ` Peter Zijlstra 2019-05-28 15:16 ` [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls Daniel Bristot de Oliveira 2 siblings, 1 reply; 27+ messages in thread From: Daniel Bristot de Oliveira @ 2019-05-28 15:16 UTC (permalink / raw) To: linux-kernel Cc: williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta The preempt_disable/enable tracepoint only traces in the disable <-> enable case, which is correct. But think about this case: ---------------------------- %< ------------------------------ THREAD IRQ | | preempt_disable() { __preempt_count_add(1) -------> smp_apic_timer_interrupt() { preempt_disable() do not trace (preempt count >= 1) .... preempt_enable() do not trace (preempt count >= 1) } trace_preempt_disable(); } ---------------------------- >% ------------------------------ The tracepoint will be skipped. It is possible to observe this problem using this kernel module: http://bristot.me/files/efficient_verification/wip.tar.gz [*] and doing the following trace: # cd /sys/kernel/debug/tracing/ # echo 1 > snapshot # cat available_events | grep preempt_ > set_event # echo irq_vectors >> /sys/kernel/debug/tracing/set_event # insmod wip.ko /* wait for a snapshot creation */ # tail -100 snapshot [...] tail-5572 [001] ....1.. 2888.401184: preempt_enable: caller=_raw_spin_unlock_irqrestore+0x2a/0x70 parent= (null) tail-5572 [001] ....1.. 2888.401184: preempt_disable: caller=migrate_disable+0x8b/0x1e0 parent=migrate_disable+0x8b/0x1e0 tail-5572 [001] ....111 2888.401184: preempt_enable: caller=migrate_disable+0x12f/0x1e0 parent=migrate_disable+0x12f/0x1e0 tail-5572 [001] d..h212 2888.401189: local_timer_entry: vector=236 tail-5572 [001] dN.h512 2888.401192: process_event: event sched_waking not expected in the state preemptive tail-5572 [001] dN.h512 2888.401200: <stack trace> => process_event => __handle_event => ttwu_do_wakeup => try_to_wake_up => invoke_rcu_core => rcu_check_callbacks => update_process_times => tick_sched_handle => tick_sched_timer => __hrtimer_run_queues => hrtimer_interrupt => smp_apic_timer_interrupt => apic_timer_interrupt => trace_event_raw_event_preemptirq_template => trace_preempt_off => get_page_from_freelist => __alloc_pages_nodemask => __handle_mm_fault => handle_mm_fault => __do_page_fault => do_page_fault => async_page_fault To avoid skipping the trace, the change in the counter should be "atomic" with the start/stop, w.r.t the interrupts. Disable interrupts while the adding/starting stopping/subtracting. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Yangtao Li <tiny.windzz@gmail.com> Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it> Cc: linux-kernel@vger.kernel.org --- [*] with some luck we will talk about this at the Plumbers. kernel/sched/core.c | 48 +++++++++++++++++++++++++++++++++------------ kernel/softirq.c | 2 +- 2 files changed, 37 insertions(+), 13 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8c0b414e45dc..be4117d7384f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3163,6 +3163,16 @@ void preempt_latency_start(int val) } } +static inline void preempt_add_start_latency(int val) +{ + unsigned long flags; + + raw_local_irq_save(flags); + __preempt_count_add(val); + preempt_latency_start(val); + raw_local_irq_restore(flags); +} + void preempt_count_add(int val) { #ifdef CONFIG_DEBUG_PREEMPT @@ -3172,7 +3182,7 @@ void preempt_count_add(int val) if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0))) return; #endif - __preempt_count_add(val); + preempt_add_start_latency(val); #ifdef CONFIG_DEBUG_PREEMPT /* * Spinlock count overflowing soon? @@ -3180,7 +3190,6 @@ void preempt_count_add(int val) DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK - 10); #endif - preempt_latency_start(val); } EXPORT_SYMBOL(preempt_count_add); NOKPROBE_SYMBOL(preempt_count_add); @@ -3195,6 +3204,16 @@ void preempt_latency_stop(int val) trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip()); } +static inline void preempt_sub_stop_latency(int val) +{ + unsigned long flags; + + raw_local_irq_save(flags); + preempt_latency_stop(val); + __preempt_count_sub(val); + raw_local_irq_restore(flags); +} + void preempt_count_sub(int val) { #ifdef CONFIG_DEBUG_PREEMPT @@ -3211,8 +3230,7 @@ void preempt_count_sub(int val) return; #endif - preempt_latency_stop(val); - __preempt_count_sub(val); + preempt_sub_stop_latency(val); } EXPORT_SYMBOL(preempt_count_sub); NOKPROBE_SYMBOL(preempt_count_sub); @@ -3220,6 +3238,16 @@ NOKPROBE_SYMBOL(preempt_count_sub); #else static inline void preempt_latency_start(int val) { } static inline void preempt_latency_stop(int val) { } + +static inline void preempt_sub_stop_latency(int val) +{ + __preempt_count_sub(val); +} + +static inline void preempt_add_start_latency(int val) +{ + __preempt_count_add(val); +} #endif static inline unsigned long get_preempt_disable_ip(struct task_struct *p) @@ -3585,11 +3613,9 @@ static void __sched notrace preempt_schedule_common(void) * traced. The other to still record the preemption latency, * which can also be traced by the function tracer. */ - preempt_disable_notrace(); - preempt_latency_start(1); + preempt_add_start_latency(1); __schedule(true); - preempt_latency_stop(1); - preempt_enable_no_resched_notrace(); + preempt_sub_stop_latency(1); /* * Check again in case we missed a preemption opportunity @@ -3653,8 +3679,7 @@ asmlinkage __visible void __sched notrace preempt_schedule_notrace(void) * traced. The other to still record the preemption latency, * which can also be traced by the function tracer. */ - preempt_disable_notrace(); - preempt_latency_start(1); + preempt_add_start_latency(1); /* * Needs preempt disabled in case user_exit() is traced * and the tracer calls preempt_enable_notrace() causing @@ -3664,8 +3689,7 @@ asmlinkage __visible void __sched notrace preempt_schedule_notrace(void) __schedule(true); exception_exit(prev_ctx); - preempt_latency_stop(1); - preempt_enable_no_resched_notrace(); + preempt_sub_stop_latency(1); } while (need_resched()); } EXPORT_SYMBOL_GPL(preempt_schedule_notrace); diff --git a/kernel/softirq.c b/kernel/softirq.c index c9ad89c3dfed..9c64522ecc76 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -130,9 +130,9 @@ void __local_bh_disable_ip(unsigned long ip, unsigned int cnt) */ if (softirq_count() == (cnt & SOFTIRQ_MASK)) trace_softirqs_off(ip); - raw_local_irq_restore(flags); preempt_latency_start(cnt); + raw_local_irq_restore(flags); } EXPORT_SYMBOL(__local_bh_disable_ip); -- 2.20.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-28 15:16 ` [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change Daniel Bristot de Oliveira @ 2019-05-29 8:33 ` Peter Zijlstra 2019-05-29 9:40 ` Daniel Bristot de Oliveira 0 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2019-05-29 8:33 UTC (permalink / raw) To: Daniel Bristot de Oliveira Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote: > The preempt_disable/enable tracepoint only traces in the disable <-> enable > case, which is correct. But think about this case: > > ---------------------------- %< ------------------------------ > THREAD IRQ > | | > preempt_disable() { > __preempt_count_add(1) > -------> smp_apic_timer_interrupt() { > preempt_disable() > do not trace (preempt count >= 1) > .... > preempt_enable() > do not trace (preempt count >= 1) > } > trace_preempt_disable(); > } > ---------------------------- >% ------------------------------ > > The tracepoint will be skipped. .... for the IRQ. But IRQs are not preemptible anyway, so what the problem? > To avoid skipping the trace, the change in the counter should be "atomic" > with the start/stop, w.r.t the interrupts. > > Disable interrupts while the adding/starting stopping/subtracting. > +static inline void preempt_add_start_latency(int val) > +{ > + unsigned long flags; > + > + raw_local_irq_save(flags); > + __preempt_count_add(val); > + preempt_latency_start(val); > + raw_local_irq_restore(flags); > +} > +static inline void preempt_sub_stop_latency(int val) > +{ > + unsigned long flags; > + > + raw_local_irq_save(flags); > + preempt_latency_stop(val); > + __preempt_count_sub(val); > + raw_local_irq_restore(flags); > +} That is hideously expensive :/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-29 8:33 ` Peter Zijlstra @ 2019-05-29 9:40 ` Daniel Bristot de Oliveira 2019-05-29 10:20 ` Peter Zijlstra 2019-05-31 7:47 ` Joel Fernandes 0 siblings, 2 replies; 27+ messages in thread From: Daniel Bristot de Oliveira @ 2019-05-29 9:40 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On 29/05/2019 10:33, Peter Zijlstra wrote: > On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote: >> The preempt_disable/enable tracepoint only traces in the disable <-> enable >> case, which is correct. But think about this case: >> >> ---------------------------- %< ------------------------------ >> THREAD IRQ >> | | >> preempt_disable() { >> __preempt_count_add(1) >> -------> smp_apic_timer_interrupt() { >> preempt_disable() >> do not trace (preempt count >= 1) >> .... >> preempt_enable() >> do not trace (preempt count >= 1) >> } >> trace_preempt_disable(); >> } >> ---------------------------- >% ------------------------------ >> >> The tracepoint will be skipped. > > .... for the IRQ. But IRQs are not preemptible anyway, so what the > problem? right, they are. exposing my problem in a more specific way: To show in a model that an event always takes place with preemption disabled, but not necessarily with IRQs disabled, it is worth having the preemption disable events separated from IRQ disable ones. The main reason is that, although IRQs disabled postpone the execution of the scheduler, it is more pessimistic, as it also delays IRQs. So the more precise the model is, the less pessimistic the analysis will be. But there are other use-cases, for instance: (Steve, correct me if I am wrong) The preempt_tracer will not notice a "preempt disabled" section in an IRQ handler if the problem above happens. (Yeah, I know these problems are very specific... but...) >> To avoid skipping the trace, the change in the counter should be "atomic" >> with the start/stop, w.r.t the interrupts. >> >> Disable interrupts while the adding/starting stopping/subtracting. > >> +static inline void preempt_add_start_latency(int val) >> +{ >> + unsigned long flags; >> + >> + raw_local_irq_save(flags); >> + __preempt_count_add(val); >> + preempt_latency_start(val); >> + raw_local_irq_restore(flags); >> +} > >> +static inline void preempt_sub_stop_latency(int val) >> +{ >> + unsigned long flags; >> + >> + raw_local_irq_save(flags); >> + preempt_latency_stop(val); >> + __preempt_count_sub(val); >> + raw_local_irq_restore(flags); >> +} > > That is hideously expensive :/ Yeah... :-( Is there another way to provide such "atomicity"? Can I use the argument "if one has these tracepoints enabled, they are not considering it as a hot-path?" -- Daniel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-29 9:40 ` Daniel Bristot de Oliveira @ 2019-05-29 10:20 ` Peter Zijlstra 2019-05-29 12:39 ` Steven Rostedt 2019-05-29 13:51 ` Daniel Bristot de Oliveira 2019-05-31 7:47 ` Joel Fernandes 1 sibling, 2 replies; 27+ messages in thread From: Peter Zijlstra @ 2019-05-29 10:20 UTC (permalink / raw) To: Daniel Bristot de Oliveira Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Wed, May 29, 2019 at 11:40:34AM +0200, Daniel Bristot de Oliveira wrote: > On 29/05/2019 10:33, Peter Zijlstra wrote: > > On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote: > >> The preempt_disable/enable tracepoint only traces in the disable <-> enable > >> case, which is correct. But think about this case: > >> > >> ---------------------------- %< ------------------------------ > >> THREAD IRQ > >> | | > >> preempt_disable() { > >> __preempt_count_add(1) > >> -------> smp_apic_timer_interrupt() { > >> preempt_disable() > >> do not trace (preempt count >= 1) > >> .... > >> preempt_enable() > >> do not trace (preempt count >= 1) > >> } > >> trace_preempt_disable(); > >> } > >> ---------------------------- >% ------------------------------ > >> > >> The tracepoint will be skipped. > > > > .... for the IRQ. But IRQs are not preemptible anyway, so what the > > problem? > > > right, they are. > > exposing my problem in a more specific way: > > To show in a model that an event always takes place with preemption disabled, > but not necessarily with IRQs disabled, it is worth having the preemption > disable events separated from IRQ disable ones. > > The main reason is that, although IRQs disabled postpone the execution of the > scheduler, it is more pessimistic, as it also delays IRQs. So the more precise > the model is, the less pessimistic the analysis will be. I'm not sure I follow, IRQs disabled fully implies !preemptible. I don't see how the model would be more pessimistic than reality if it were to use this knowledge. Any !0 preempt_count(), which very much includes (Hard)IRQ and SoftIRQ counts, means non-preemptible. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-29 10:20 ` Peter Zijlstra @ 2019-05-29 12:39 ` Steven Rostedt 2019-05-29 13:19 ` Peter Zijlstra 2019-05-29 13:51 ` Daniel Bristot de Oliveira 1 sibling, 1 reply; 27+ messages in thread From: Steven Rostedt @ 2019-05-29 12:39 UTC (permalink / raw) To: Peter Zijlstra Cc: Daniel Bristot de Oliveira, linux-kernel, williams, daniel, Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Wed, 29 May 2019 12:20:38 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, May 29, 2019 at 11:40:34AM +0200, Daniel Bristot de Oliveira wrote: > > On 29/05/2019 10:33, Peter Zijlstra wrote: > > > On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote: > > >> The preempt_disable/enable tracepoint only traces in the disable <-> enable > > >> case, which is correct. But think about this case: > > >> > > >> ---------------------------- %< ------------------------------ > > >> THREAD IRQ > > >> | | > > >> preempt_disable() { > > >> __preempt_count_add(1) > > >> -------> smp_apic_timer_interrupt() { > > >> preempt_disable() > > >> do not trace (preempt count >= 1) > > >> .... > > >> preempt_enable() > > >> do not trace (preempt count >= 1) > > >> } > > >> trace_preempt_disable(); > > >> } > > >> ---------------------------- >% ------------------------------ > > >> > > >> The tracepoint will be skipped. > > > > > > .... for the IRQ. But IRQs are not preemptible anyway, so what the > > > problem? > > > > > > right, they are. > > > > exposing my problem in a more specific way: > > > > To show in a model that an event always takes place with preemption disabled, > > but not necessarily with IRQs disabled, it is worth having the preemption > > disable events separated from IRQ disable ones. > > > > The main reason is that, although IRQs disabled postpone the execution of the > > scheduler, it is more pessimistic, as it also delays IRQs. So the more precise > > the model is, the less pessimistic the analysis will be. > > I'm not sure I follow, IRQs disabled fully implies !preemptible. I don't > see how the model would be more pessimistic than reality if it were to > use this knowledge. > > Any !0 preempt_count(), which very much includes (Hard)IRQ and SoftIRQ > counts, means non-preemptible. I believe I see what Daniel is talking about, but I hate the proposed solution ;-) First, if you care about real times that the CPU can't preempt (preempt_count != 0 or interrupts disabled), then you want the preempt_irqsoff tracer. The preempt_tracer is more academic where it just shows you when we disable preemption via the counter. But even with the preempt_irqsoff tracer you may not get the full length of time due to the above explained race. __preempt_count_add(1) <-- CPU now prevents preemption <interrupt> ----> trace_hardirqs_off() <-- Start preempt disable timer handler(); trace_hardirqs_on() <-- Stop preempt disable timer (Diff A) <---- trace_preempt_disable() <-- Start preempt disable timer [..] trace_preempt_enable() <-- Stop preempt disable timer (Diff B) __preempt_count_sub(1) <-- CPU re-enables preemption The preempt_irqsoff tracer will break this into two parts: Diff A and Diff B, when in reality, the total time the CPU prevented preemption was A + B. Note, we can have the same race if an interrupt came in just after calling trace_preempt_enable() and before the __preempt_count_sub(). What I would recommend is adding a flag to the task_struct that gets set before the __preempt_count_add() and cleared by the tracing function. If an interrupt goes off during this time, it will start the total time to record, and not end it on the trace_hardirqs_on() part. Now since we set this flag before disabling preemption, what if we get preempted before calling __preempt_count_add()?. Simple, have a hook in the scheduler (just connect to the sched_switch tracepoint) that checks that flag, and if it is set, it ends the preempt disable recording time. Also on scheduling that task back in, if that flag is set, start the preempt disable timer. It may get a bit more complex, but we can contain that complexity within the tracing code, and we don't have to do added irq disabling. -- Steve ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-29 12:39 ` Steven Rostedt @ 2019-05-29 13:19 ` Peter Zijlstra 2019-05-29 13:29 ` Peter Zijlstra 2019-05-29 13:42 ` Steven Rostedt 0 siblings, 2 replies; 27+ messages in thread From: Peter Zijlstra @ 2019-05-29 13:19 UTC (permalink / raw) To: Steven Rostedt Cc: Daniel Bristot de Oliveira, linux-kernel, williams, daniel, Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Wed, May 29, 2019 at 08:39:30AM -0400, Steven Rostedt wrote: > I believe I see what Daniel is talking about, but I hate the proposed > solution ;-) > > First, if you care about real times that the CPU can't preempt > (preempt_count != 0 or interrupts disabled), then you want the > preempt_irqsoff tracer. The preempt_tracer is more academic where it > just shows you when we disable preemption via the counter. But even > with the preempt_irqsoff tracer you may not get the full length of time > due to the above explained race. IOW, that tracer gives a completely 'make believe' number? What's the point? Just delete the pure preempt tracer. And the preempt_irqoff tracer had better also consume the IRQ events, and if it does that it can DTRT without extra bits on, even with that race. Consider: preempt_disable() preempt_count += 1; <IRQ> trace_irq_enter(); trace_irq_exit(); </IRQ> trace_preempt_disable(); /* does stuff */ preempt_enable() preempt_count -= 1; trace_preempt_enable(); You're saying preempt_irqoff() fails to connect the two because of the hole between trace_irq_exit() and trace_preempt_disable() ? But trace_irq_exit() can see the raised preempt_count and set state for trace_preempt_disable() to connect. > What I would recommend is adding a flag to the task_struct that gets > set before the __preempt_count_add() and cleared by the tracing > function. If an interrupt goes off during this time, it will start the > total time to record, and not end it on the trace_hardirqs_on() part. > Now since we set this flag before disabling preemption, what if we get > preempted before calling __preempt_count_add()?. Simple, have a hook in > the scheduler (just connect to the sched_switch tracepoint) that checks > that flag, and if it is set, it ends the preempt disable recording > time. Also on scheduling that task back in, if that flag is set, start > the preempt disable timer. I don't think that works, you also have to consider softirq. And yes you can make it more complicated, but I still don't see the point. And none of this is relevant for Daniels model stuff. He just needs to consider in-IRQ as !preempt. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-29 13:19 ` Peter Zijlstra @ 2019-05-29 13:29 ` Peter Zijlstra 2019-05-29 13:42 ` Steven Rostedt 1 sibling, 0 replies; 27+ messages in thread From: Peter Zijlstra @ 2019-05-29 13:29 UTC (permalink / raw) To: Steven Rostedt Cc: Daniel Bristot de Oliveira, linux-kernel, williams, daniel, Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Wed, May 29, 2019 at 03:19:57PM +0200, Peter Zijlstra wrote: > On Wed, May 29, 2019 at 08:39:30AM -0400, Steven Rostedt wrote: > > I believe I see what Daniel is talking about, but I hate the proposed > > solution ;-) > > > > First, if you care about real times that the CPU can't preempt > > (preempt_count != 0 or interrupts disabled), then you want the > > preempt_irqsoff tracer. The preempt_tracer is more academic where it > > just shows you when we disable preemption via the counter. But even > > with the preempt_irqsoff tracer you may not get the full length of time > > due to the above explained race. > > IOW, that tracer gives a completely 'make believe' number? What's the > point? Just delete the pure preempt tracer. Alternatively, fix the preempt tracer by having it completely disregard IRQs. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-29 13:19 ` Peter Zijlstra 2019-05-29 13:29 ` Peter Zijlstra @ 2019-05-29 13:42 ` Steven Rostedt 2019-05-29 13:49 ` Peter Zijlstra 1 sibling, 1 reply; 27+ messages in thread From: Steven Rostedt @ 2019-05-29 13:42 UTC (permalink / raw) To: Peter Zijlstra Cc: Daniel Bristot de Oliveira, linux-kernel, williams, daniel, Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Wed, 29 May 2019 15:19:57 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, May 29, 2019 at 08:39:30AM -0400, Steven Rostedt wrote: > > I believe I see what Daniel is talking about, but I hate the proposed > > solution ;-) > > > > First, if you care about real times that the CPU can't preempt > > (preempt_count != 0 or interrupts disabled), then you want the > > preempt_irqsoff tracer. The preempt_tracer is more academic where it > > just shows you when we disable preemption via the counter. But even > > with the preempt_irqsoff tracer you may not get the full length of time > > due to the above explained race. > > IOW, that tracer gives a completely 'make believe' number? What's the > point? Just delete the pure preempt tracer. The preempt_tracer is there as part of the preempt_irqsoff tracer implementation. By removing it, the only code we would remove is displaying preemptoff as a tracer. I stated this when it was created, that it was more of an academic exercise if you use it, but that code was required to get preempt_irqsoff working. > > And the preempt_irqoff tracer had better also consume the IRQ events, > and if it does that it can DTRT without extra bits on, even with that > race. > > Consider: > > preempt_disable() > preempt_count += 1; > <IRQ> > trace_irq_enter(); > > trace_irq_exit(); > </IRQ> > trace_preempt_disable(); > > /* does stuff */ > > preempt_enable() > preempt_count -= 1; > trace_preempt_enable(); > > You're saying preempt_irqoff() fails to connect the two because of the > hole between trace_irq_exit() and trace_preempt_disable() ? > > But trace_irq_exit() can see the raised preempt_count and set state > for trace_preempt_disable() to connect. That's basically what I was suggesting as the solution to this ;-) > > > What I would recommend is adding a flag to the task_struct that gets > > set before the __preempt_count_add() and cleared by the tracing > > function. If an interrupt goes off during this time, it will start > > the total time to record, and not end it on the trace_hardirqs_on() > > part. Now since we set this flag before disabling preemption, what > > if we get preempted before calling __preempt_count_add()?. Simple, > > have a hook in the scheduler (just connect to the sched_switch > > tracepoint) that checks that flag, and if it is set, it ends the > > preempt disable recording time. Also on scheduling that task back > > in, if that flag is set, start the preempt disable timer. > > I don't think that works, you also have to consider softirq. And yes > you can make it more complicated, but I still don't see the point. Note, there's places that disable preemption without being traced. If we trigger only on preemption being disabled and start the "timer", there may not be any code to stop it. That was why I recommended the flag in the code that starts the timing. > > And none of this is relevant for Daniels model stuff. He just needs to > consider in-IRQ as !preempt. But he does bring up an issues that preempt_irqsoff fails. -- Steve ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-29 13:42 ` Steven Rostedt @ 2019-05-29 13:49 ` Peter Zijlstra 2019-05-29 13:58 ` Steven Rostedt 0 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2019-05-29 13:49 UTC (permalink / raw) To: Steven Rostedt Cc: Daniel Bristot de Oliveira, linux-kernel, williams, daniel, Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Wed, May 29, 2019 at 09:42:13AM -0400, Steven Rostedt wrote: > > And the preempt_irqoff tracer had better also consume the IRQ events, > > and if it does that it can DTRT without extra bits on, even with that > > race. > > > > Consider: > > > > preempt_disable() > > preempt_count += 1; > > <IRQ> > > trace_irq_enter(); > > > > trace_irq_exit(); > > </IRQ> > > trace_preempt_disable(); > > > > /* does stuff */ > > > > preempt_enable() > > preempt_count -= 1; > > trace_preempt_enable(); > > > > You're saying preempt_irqoff() fails to connect the two because of the > > hole between trace_irq_exit() and trace_preempt_disable() ? > > > > But trace_irq_exit() can see the raised preempt_count and set state > > for trace_preempt_disable() to connect. > > That's basically what I was suggesting as the solution to this ;-) You were wanting changes to preempt_disable() and task_struct, neither of which is required. The above only needs some per-cpu storage in the tracer implementation. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-29 13:49 ` Peter Zijlstra @ 2019-05-29 13:58 ` Steven Rostedt 0 siblings, 0 replies; 27+ messages in thread From: Steven Rostedt @ 2019-05-29 13:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Daniel Bristot de Oliveira, linux-kernel, williams, daniel, Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Wed, 29 May 2019 15:49:46 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > > That's basically what I was suggesting as the solution to this ;-) > > You were wanting changes to preempt_disable() and task_struct, neither > of which is required. The above only needs some per-cpu storage in the > tracer implementation. Only changes were to the trace preempt_disable() code. Which I still think needs to be done regardless to differentiate between when we are tracing preempt disable and when we are not. And no modification would need to be done to task_struct as we already have a place to add tracing flags. -- Steve ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-29 10:20 ` Peter Zijlstra 2019-05-29 12:39 ` Steven Rostedt @ 2019-05-29 13:51 ` Daniel Bristot de Oliveira 2019-05-29 18:21 ` Peter Zijlstra 1 sibling, 1 reply; 27+ messages in thread From: Daniel Bristot de Oliveira @ 2019-05-29 13:51 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On 29/05/2019 12:20, Peter Zijlstra wrote: > On Wed, May 29, 2019 at 11:40:34AM +0200, Daniel Bristot de Oliveira wrote: >> On 29/05/2019 10:33, Peter Zijlstra wrote: >>> On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote: >>>> The preempt_disable/enable tracepoint only traces in the disable <-> enable >>>> case, which is correct. But think about this case: >>>> >>>> ---------------------------- %< ------------------------------ >>>> THREAD IRQ >>>> | | >>>> preempt_disable() { >>>> __preempt_count_add(1) >>>> -------> smp_apic_timer_interrupt() { >>>> preempt_disable() >>>> do not trace (preempt count >= 1) >>>> .... >>>> preempt_enable() >>>> do not trace (preempt count >= 1) >>>> } >>>> trace_preempt_disable(); >>>> } >>>> ---------------------------- >% ------------------------------ >>>> >>>> The tracepoint will be skipped. >>> >>> .... for the IRQ. But IRQs are not preemptible anyway, so what the >>> problem? >> >> >> right, they are. >> >> exposing my problem in a more specific way: >> >> To show in a model that an event always takes place with preemption disabled, >> but not necessarily with IRQs disabled, it is worth having the preemption >> disable events separated from IRQ disable ones. >> >> The main reason is that, although IRQs disabled postpone the execution of the >> scheduler, it is more pessimistic, as it also delays IRQs. So the more precise >> the model is, the less pessimistic the analysis will be. > > I'm not sure I follow, IRQs disabled fully implies !preemptible. I don't > see how the model would be more pessimistic than reality if it were to > use this knowledge. Maybe I did not expressed myself well... and the example was not good either. "IRQs disabled fully implies !preemptible" is a "to big" step. In modeling (or mathematical reasoning?), a good practice is to break the properties into small piece, and then build more complex reasoning/implications using these "small properties." Doing "big steps" makes you prone "miss interpretations", creating ambiguity. Then, -RT people are prone to be pessimist, non-RT optimistic, and so on... and that is what models try to avoid. For instance, explaining this using words is contradictory:> > Any !0 preempt_count(), which very much includes (Hard)IRQ and SoftIRQ > counts, means non-preemptible. One might argue that, the preemption of a thread always takes place with preempt_count() != 0, because __schedule() is always called with preemption disabled, so the preemption takes place while in non-preemptive. A more elaborated example: ------------------ %< -------------------------- Thread A is running, and goes to sleep waiting for a timer... schedule() { preempt_disable(); __schedule() { smp_apic_timer_interrupt() { sched_wakeup (Thread A); sched_wakeup (Thread B: highest prio) { sched_set_need_resched(); } } local_irq_disable() context switch to B, A leaves in state=R. ------------------ %< -------------------------- In this case, the thread A suffered a "preemption" with "!0 preempt_count()" The fact is, Linux does not fit straight in the "well known terminology" of academic papers because many of those terminology bases in the fact that operations are atomic. But they are not and Linux has some behaviors that desires new terminology/interpretation... - WAIT But you (daniel) wants to fake the atomicity between preempt_disable and its tracepoint! Yes, I do, but this is a very straightforward step/assumption: the atomicity is about the real-event and the tracepoint that notifies it. It is not about two different events. That is why it is worth letting the modeling rules to clarify the behavior of system, without doing non-obvious implication in the code part, so we can have a model that fits better in the Linux actions/events to avoid ambiguity. [ note 1: the tracepoint is only enabled if CONFIG_PREEMPTIRQ_TRACEPOINTS=y which is not enabled by default ] [ note 2: I just saw that Steven replied while I was writing this email... I will read them now... sorry for some repetitive topic here ] -- Daniel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-29 13:51 ` Daniel Bristot de Oliveira @ 2019-05-29 18:21 ` Peter Zijlstra 2019-06-04 10:20 ` Daniel Bristot de Oliveira 0 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2019-05-29 18:21 UTC (permalink / raw) To: Daniel Bristot de Oliveira Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Wed, May 29, 2019 at 03:51:31PM +0200, Daniel Bristot de Oliveira wrote: > On 29/05/2019 12:20, Peter Zijlstra wrote: > > I'm not sure I follow, IRQs disabled fully implies !preemptible. I don't > > see how the model would be more pessimistic than reality if it were to > > use this knowledge. > > Maybe I did not expressed myself well... and the example was not good either. > > "IRQs disabled fully implies !preemptible" is a "to big" step. In modeling (or > mathematical reasoning?), a good practice is to break the properties into small > piece, and then build more complex reasoning/implications using these "small > properties." > > Doing "big steps" makes you prone "miss interpretations", creating ambiguity. > Then, -RT people are prone to be pessimist, non-RT optimistic, and so on... and > that is what models try to avoid. You already construct the big model out of small generators, this is just one more of those little generators. > For instance, explaining this using words is contradictory:> > > Any !0 preempt_count(), which very much includes (Hard)IRQ and SoftIRQ > > counts, means non-preemptible. > > One might argue that, the preemption of a thread always takes place with > preempt_count() != 0, because __schedule() is always called with preemption > disabled, so the preemption takes place while in non-preemptive. Yeah, I know about that one; you've used it in your talks. Also, you've modeled the schedule preempt disable as a special state. If you want we can actually make it a special bit in the preempt_count word too, the patch shouldn't be too hard, although it would make no practical difference. > - WAIT But you (daniel) wants to fake the atomicity between preempt_disable and > its tracepoint! > > Yes, I do, but this is a very straightforward step/assumption: the atomicity is > about the real-event and the tracepoint that notifies it. It is not about two > different events. > > That is why it is worth letting the modeling rules to clarify the behavior of > system, without doing non-obvious implication in the code part, so we can have a > model that fits better in the Linux actions/events to avoid ambiguity. You can easily build a little shim betwen the model and the tracehooks that fix this up if you don't want to stick it in the model proper. All the information is there. Heck you can even do that 3/3 thing internally I think. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-29 18:21 ` Peter Zijlstra @ 2019-06-04 10:20 ` Daniel Bristot de Oliveira 0 siblings, 0 replies; 27+ messages in thread From: Daniel Bristot de Oliveira @ 2019-06-04 10:20 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On 29/05/2019 20:21, Peter Zijlstra wrote: > On Wed, May 29, 2019 at 03:51:31PM +0200, Daniel Bristot de Oliveira wrote: >> On 29/05/2019 12:20, Peter Zijlstra wrote: > >>> I'm not sure I follow, IRQs disabled fully implies !preemptible. I don't >>> see how the model would be more pessimistic than reality if it were to >>> use this knowledge. >> >> Maybe I did not expressed myself well... and the example was not good either. >> >> "IRQs disabled fully implies !preemptible" is a "to big" step. In modeling (or >> mathematical reasoning?), a good practice is to break the properties into small >> piece, and then build more complex reasoning/implications using these "small >> properties." >> >> Doing "big steps" makes you prone "miss interpretations", creating ambiguity. >> Then, -RT people are prone to be pessimist, non-RT optimistic, and so on... and >> that is what models try to avoid. > > You already construct the big model out of small generators, this is > just one more of those little generators. Yes, we can take that way too... >> For instance, explaining this using words is contradictory:> >>> Any !0 preempt_count(), which very much includes (Hard)IRQ and SoftIRQ >>> counts, means non-preemptible. >> >> One might argue that, the preemption of a thread always takes place with >> preempt_count() != 0, because __schedule() is always called with preemption >> disabled, so the preemption takes place while in non-preemptive. > > Yeah, I know about that one; you've used it in your talks. Also, you've > modeled the schedule preempt disable as a special state. If you want we > can actually make it a special bit in the preempt_count word too, the > patch shouldn't be too hard, although it would make no practical > difference. Good to know! I am trying not to change the code or abstractions. In the user-space version I was using the caller address to figure out if it was a call in the scheduler or not. In the kernel version I am planing to use a is_sched_function() like approach. But if nothing of that works, I will explore this possibility, thanks for the suggestion. >> - WAIT But you (daniel) wants to fake the atomicity between preempt_disable and >> its tracepoint! >> >> Yes, I do, but this is a very straightforward step/assumption: the atomicity is >> about the real-event and the tracepoint that notifies it. It is not about two >> different events. >> >> That is why it is worth letting the modeling rules to clarify the behavior of >> system, without doing non-obvious implication in the code part, so we can have a >> model that fits better in the Linux actions/events to avoid ambiguity. > > You can easily build a little shim betwen the model and the tracehooks > that fix this up if you don't want to stick it in the model proper. > > All the information is there. Heck you can even do that 3/3 thing > internally I think. > Ack, let's see what I can came up. Thanks! -- Daniel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-29 9:40 ` Daniel Bristot de Oliveira 2019-05-29 10:20 ` Peter Zijlstra @ 2019-05-31 7:47 ` Joel Fernandes 2019-06-04 10:12 ` Daniel Bristot de Oliveira 1 sibling, 1 reply; 27+ messages in thread From: Joel Fernandes @ 2019-05-31 7:47 UTC (permalink / raw) To: Daniel Bristot de Oliveira Cc: Peter Zijlstra, linux-kernel, williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Wed, May 29, 2019 at 11:40:34AM +0200, Daniel Bristot de Oliveira wrote: > On 29/05/2019 10:33, Peter Zijlstra wrote: > > On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote: > >> The preempt_disable/enable tracepoint only traces in the disable <-> enable > >> case, which is correct. But think about this case: > >> > >> ---------------------------- %< ------------------------------ > >> THREAD IRQ > >> | | > >> preempt_disable() { > >> __preempt_count_add(1) > >> -------> smp_apic_timer_interrupt() { > >> preempt_disable() > >> do not trace (preempt count >= 1) > >> .... > >> preempt_enable() > >> do not trace (preempt count >= 1) > >> } > >> trace_preempt_disable(); > >> } > >> ---------------------------- >% ------------------------------ > >> > >> The tracepoint will be skipped. > > > > .... for the IRQ. But IRQs are not preemptible anyway, so what the > > problem? > > > right, they are. > > exposing my problem in a more specific way: > > To show in a model that an event always takes place with preemption disabled, > but not necessarily with IRQs disabled, it is worth having the preemption > disable events separated from IRQ disable ones. > > The main reason is that, although IRQs disabled postpone the execution of the > scheduler, it is more pessimistic, as it also delays IRQs. So the more precise > the model is, the less pessimistic the analysis will be. > > But there are other use-cases, for instance: > > (Steve, correct me if I am wrong) > > The preempt_tracer will not notice a "preempt disabled" section in an IRQ > handler if the problem above happens. > > (Yeah, I know these problems are very specific... but...) I agree with the problem. I think Daniel does not want to miss the preemption disabled event caused by the IRQ disabling. > >> To avoid skipping the trace, the change in the counter should be "atomic" > >> with the start/stop, w.r.t the interrupts. > >> > >> Disable interrupts while the adding/starting stopping/subtracting. > > > >> +static inline void preempt_add_start_latency(int val) > >> +{ > >> + unsigned long flags; > >> + > >> + raw_local_irq_save(flags); > >> + __preempt_count_add(val); > >> + preempt_latency_start(val); > >> + raw_local_irq_restore(flags); > >> +} > > > >> +static inline void preempt_sub_stop_latency(int val) > >> +{ > >> + unsigned long flags; > >> + > >> + raw_local_irq_save(flags); > >> + preempt_latency_stop(val); > >> + __preempt_count_sub(val); > >> + raw_local_irq_restore(flags); > >> +} > > > > That is hideously expensive :/ > > Yeah... :-( Is there another way to provide such "atomicity"? > > Can I use the argument "if one has these tracepoints enabled, they are not > considering it as a hot-path?" The only addition here seems to the raw_local_irq_{save,restore} around the calls to increment the preempt counter and start the latency tracking. Is there any performance data with the tracepoint enabled and with/without this patch? Like with hackbench? Thanks. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-05-31 7:47 ` Joel Fernandes @ 2019-06-04 10:12 ` Daniel Bristot de Oliveira 2019-06-04 15:01 ` Steven Rostedt 0 siblings, 1 reply; 27+ messages in thread From: Daniel Bristot de Oliveira @ 2019-06-04 10:12 UTC (permalink / raw) To: Joel Fernandes Cc: Peter Zijlstra, linux-kernel, williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On 31/05/2019 09:47, Joel Fernandes wrote: > On Wed, May 29, 2019 at 11:40:34AM +0200, Daniel Bristot de Oliveira wrote: >> On 29/05/2019 10:33, Peter Zijlstra wrote: >>> On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote: >>>> The preempt_disable/enable tracepoint only traces in the disable <-> enable >>>> case, which is correct. But think about this case: >>>> >>>> ---------------------------- %< ------------------------------ >>>> THREAD IRQ >>>> | | >>>> preempt_disable() { >>>> __preempt_count_add(1) >>>> -------> smp_apic_timer_interrupt() { >>>> preempt_disable() >>>> do not trace (preempt count >= 1) >>>> .... >>>> preempt_enable() >>>> do not trace (preempt count >= 1) >>>> } >>>> trace_preempt_disable(); >>>> } >>>> ---------------------------- >% ------------------------------ >>>> >>>> The tracepoint will be skipped. >>> >>> .... for the IRQ. But IRQs are not preemptible anyway, so what the >>> problem? >> >> >> right, they are. >> >> exposing my problem in a more specific way: >> >> To show in a model that an event always takes place with preemption disabled, >> but not necessarily with IRQs disabled, it is worth having the preemption >> disable events separated from IRQ disable ones. >> >> The main reason is that, although IRQs disabled postpone the execution of the >> scheduler, it is more pessimistic, as it also delays IRQs. So the more precise >> the model is, the less pessimistic the analysis will be. >> >> But there are other use-cases, for instance: >> >> (Steve, correct me if I am wrong) >> >> The preempt_tracer will not notice a "preempt disabled" section in an IRQ >> handler if the problem above happens. >> >> (Yeah, I know these problems are very specific... but...) > > I agree with the problem. I think Daniel does not want to miss the preemption > disabled event caused by the IRQ disabling. Hi Joel! Correct, but ... look bellow. >>>> To avoid skipping the trace, the change in the counter should be "atomic" >>>> with the start/stop, w.r.t the interrupts. >>>> >>>> Disable interrupts while the adding/starting stopping/subtracting. >>> >>>> +static inline void preempt_add_start_latency(int val) >>>> +{ >>>> + unsigned long flags; >>>> + >>>> + raw_local_irq_save(flags); >>>> + __preempt_count_add(val); >>>> + preempt_latency_start(val); >>>> + raw_local_irq_restore(flags); >>>> +} >>> >>>> +static inline void preempt_sub_stop_latency(int val) >>>> +{ >>>> + unsigned long flags; >>>> + >>>> + raw_local_irq_save(flags); >>>> + preempt_latency_stop(val); >>>> + __preempt_count_sub(val); >>>> + raw_local_irq_restore(flags); >>>> +} >>> >>> That is hideously expensive :/ >> >> Yeah... :-( Is there another way to provide such "atomicity"? >> >> Can I use the argument "if one has these tracepoints enabled, they are not >> considering it as a hot-path?" > > The only addition here seems to the raw_local_irq_{save,restore} around the > calls to increment the preempt counter and start the latency tracking. > > Is there any performance data with the tracepoint enabled and with/without > this patch? Like with hackbench? I discussed this with Steve at the Summit on the Summit (the reason why I did not reply this email earlier is because I was in the conf/traveling), and he also agrees with peterz, disabling and (mainly) re-enabling IRQs costs too much. We need to find another way to resolve this problem (or mitigate the cost).... :-(. Ideas? Thanks!! -- Daniel ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-06-04 10:12 ` Daniel Bristot de Oliveira @ 2019-06-04 15:01 ` Steven Rostedt 2019-06-05 15:16 ` Joel Fernandes 0 siblings, 1 reply; 27+ messages in thread From: Steven Rostedt @ 2019-06-04 15:01 UTC (permalink / raw) To: Daniel Bristot de Oliveira Cc: Joel Fernandes, Peter Zijlstra, linux-kernel, williams, daniel, Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Tue, 4 Jun 2019 12:12:36 +0200 Daniel Bristot de Oliveira <bristot@redhat.com> wrote: > I discussed this with Steve at the Summit on the Summit (the reason why I did > not reply this email earlier is because I was in the conf/traveling), and he > also agrees with peterz, disabling and (mainly) re-enabling IRQs costs too much. > > We need to find another way to resolve this problem (or mitigate the cost).... :-(. > > Ideas? I thought we talked about using flags in the pc to let us know that we are about to trace the preemption off? If an interrupt comes in, we check the flag: irq_enter() preempt_count_add(HARDIRQ_OFFSET) Then we can have something like: preempt_count_add(val) { int pc = preempt_count(); int trace_val = TRACE_FLAG; if (val == HARDIRQ_OFFSET && (pc & TRACE_FLAG)) { __preempt_count_sub(TRACE_FLAG); trace_val |= TRACE_SET; } __preempt_count_add(val + trace_val); if (!pc) trace_preempt_disable(); __preempt_count_sub(trace_val); And in trace_preempt_enable() we have: if ((preempt_count() & TRACE_SET) && in_irq()) return; Thus, we wont call the preempt_enable tracing code if we started it due to an interrupt preempting the process of setting the trace. Note, I'm writing this while extremely tired (still have yet to get more than 4 hours of sleep) so it may still have bugs, but you should get the idea ;-) -- Steve ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change 2019-06-04 15:01 ` Steven Rostedt @ 2019-06-05 15:16 ` Joel Fernandes 0 siblings, 0 replies; 27+ messages in thread From: Joel Fernandes @ 2019-06-05 15:16 UTC (permalink / raw) To: Steven Rostedt Cc: Daniel Bristot de Oliveira, Peter Zijlstra, linux-kernel, williams, daniel, Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Tue, Jun 04, 2019 at 11:01:27AM -0400, Steven Rostedt wrote: > On Tue, 4 Jun 2019 12:12:36 +0200 > Daniel Bristot de Oliveira <bristot@redhat.com> wrote: > > > I discussed this with Steve at the Summit on the Summit (the reason why I did > > not reply this email earlier is because I was in the conf/traveling), and he > > also agrees with peterz, disabling and (mainly) re-enabling IRQs costs too much. > > > > We need to find another way to resolve this problem (or mitigate the cost).... :-(. > > > > Ideas? > > I thought we talked about using flags in the pc to let us know that we > are about to trace the preemption off? > > > If an interrupt comes in, we check the flag: > > irq_enter() > preempt_count_add(HARDIRQ_OFFSET) > > > Then we can have something like: > > preempt_count_add(val) { > int pc = preempt_count(); > int trace_val = TRACE_FLAG; > > if (val == HARDIRQ_OFFSET && (pc & TRACE_FLAG)) { > __preempt_count_sub(TRACE_FLAG); > trace_val |= TRACE_SET; > } > > __preempt_count_add(val + trace_val); > if (!pc) > trace_preempt_disable(); > __preempt_count_sub(trace_val); > > > And in trace_preempt_enable() we have: > > if ((preempt_count() & TRACE_SET) && in_irq()) > return; > > Thus, we wont call the preempt_enable tracing code if we started it due > to an interrupt preempting the process of setting the trace. Hmm, the interrupt handler will not be calling preempt_enable anyway since the preempt counter was already set by the interrupted code. The situation Daniel describes in patch 2/3 is: ---------------------------- %< ------------------------------ THREAD IRQ | | preempt_disable() { __preempt_count_add(1) -------> smp_apic_timer_interrupt() { preempt_disable() do not trace (preempt count >= 1) .... preempt_enable() do not trace (preempt count >= 1) } trace_preempt_disable(); } ---------------------------- >% ------------------------------ Here the preempt_enable in the IRQ will not call tracing. If I understand correctly, he *does* want to call tracing since the IRQ's preempt disable/enable section could be running for some time, and so it would not be nice to miss these events from the traces. Regarding the preempt count flag idea, I really like your idea. I did not fully follow the code snip you shared above though. My understanding of your idea initially was, we set a flag in pc, then increment the preempt count, call tracing and the reset the pc flag. In between the increment and the call to the tracer, if we get interrupted, then we check the flag from IRQ context, and still call tracing. If we get interrupted after calling tracing, but before resetting the flag, then we may end up with a nested pair of preempt enable/disable calls, but that's Ok. At least we wont miss any events. At least I think we only need to modify the preempt disable path, something like this in pseudo code in the preempt disable path (of the THREAD context): set pc flag inc pc <--- if interrupted here, still trace from IRQ. call tracer <--- if interrupted here, still trace from IRQ. reset pc flag <--- if interrupted here, no need to trace from IRQ since we already recorded at least one preempt disable event. Did I get your idea right? I need to think about this some more, but got to run to an appointment. Will take a look once I'm back. > Note, I'm writing this while extremely tired (still have yet to get > more than 4 hours of sleep) so it may still have bugs, but you should > get the idea ;-) No problem ;-) thanks, - Joel ^ permalink raw reply [flat|nested] 27+ messages in thread
* [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls 2019-05-28 15:16 [RFC 0/3] preempt_tracer: Fix preempt_disable tracepoint Daniel Bristot de Oliveira 2019-05-28 15:16 ` [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption Daniel Bristot de Oliveira 2019-05-28 15:16 ` [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change Daniel Bristot de Oliveira @ 2019-05-28 15:16 ` Daniel Bristot de Oliveira 2019-05-29 8:41 ` Peter Zijlstra 2 siblings, 1 reply; 27+ messages in thread From: Daniel Bristot de Oliveira @ 2019-05-28 15:16 UTC (permalink / raw) To: linux-kernel Cc: williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta The preempt_disable tracepoint only traces in the disable <-> enable case. Which is correct, but think about this case: --------------------------- %< ---------------------- THREAD IRQ | | preempt_disable_notrace() { __preempt_count_add(1) } /* preemption disabled/IRQs enabled */ -------> smp_apic_timer_interrupt() { preempt_disable() { do not trace (preempt count >= 1) } .... preempt_enable() { do not trace (preempt count >= 1) } <------- } preempt_enable_notrace() { __preempt_count_sub(1) } --------------------------- >% ---------------------- The non-traceble preempt_disable can hide a legit preempt_disable (which is worth tracing). It is possible to observe this problem using this kernel module: http://bristot.me/files/efficient_verification/wip.tar.gz and doing the following trace: # cd /sys/kernel/debug/tracing/ # echo 1 > snapshot # cat available_events | grep preempt_ > set_event # echo irq_vectors >> /sys/kernel/debug/tracing/set_event # insmod wip.ko /* wait for a snapshot creation */ # tail -100 snapshot sshd-1159 [000] d...1.. 2440.866116: preempt_enable: caller=migrate_enable+0x1bb/0x330 parent=migrate_enable+0x1bb/0x330 sshd-1159 [000] d..h1.. 2440.866122: local_timer_entry: vector=236 sshd-1159 [000] d..h1.. 2440.866127: local_timer_exit: vector=236 sshd-1159 [000] d.L.4.. 2440.866129: process_event: event sched_waking not expected in the state preemptive sshd-1159 [000] d.L.4.. 2440.866137: <stack trace> => process_event => __handle_event => ttwu_do_wakeup => try_to_wake_up => irq_exit => smp_apic_timer_interrupt => apic_timer_interrupt => kvm_clock_read => ktime_get_with_offset => posix_get_boottime => __x64_sys_clock_gettime => do_syscall_64 => entry_SYSCALL_64_after_hwframe and kvm_clock_read() disables preemption without tracing: --------------------------- %< ---------------------- static u64 kvm_clock_read(void) { u64 ret; preempt_disable_notrace(); ret = pvclock_clocksource_read(this_cpu_pvti()); preempt_enable_notrace(); return ret; } --------------------------- >% ---------------------- Use a percpu variable for the traced preempt_disable/enable, and use it to decide whether trace or not. Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Yangtao Li <tiny.windzz@gmail.com> Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it> Cc: linux-kernel@vger.kernel.org --- kernel/sched/core.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index be4117d7384f..2e07d4174778 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3148,19 +3148,25 @@ static inline void sched_tick_stop(int cpu) { } #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \ defined(CONFIG_TRACE_PREEMPT_TOGGLE)) + +DEFINE_PER_CPU(int, __traced_preempt_count) = 0; /* * If the value passed in is equal to the current preempt count * then we just disabled preemption. Start timing the latency. */ void preempt_latency_start(int val) { - if (preempt_count() == val) { + int curr = this_cpu_read(__traced_preempt_count); + + if (!curr) { unsigned long ip = get_lock_parent_ip(); #ifdef CONFIG_DEBUG_PREEMPT current->preempt_disable_ip = ip; #endif trace_preempt_off(CALLER_ADDR0, ip); } + + this_cpu_write(__traced_preempt_count, curr + val); } static inline void preempt_add_start_latency(int val) @@ -3200,8 +3206,12 @@ NOKPROBE_SYMBOL(preempt_count_add); */ void preempt_latency_stop(int val) { - if (preempt_count() == val) + int curr = this_cpu_read(__traced_preempt_count) - val; + + if (!curr) trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip()); + + this_cpu_write(__traced_preempt_count, curr); } static inline void preempt_sub_stop_latency(int val) -- 2.20.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls 2019-05-28 15:16 ` [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls Daniel Bristot de Oliveira @ 2019-05-29 8:41 ` Peter Zijlstra 2019-05-29 9:48 ` Daniel Bristot de Oliveira 0 siblings, 1 reply; 27+ messages in thread From: Peter Zijlstra @ 2019-05-29 8:41 UTC (permalink / raw) To: Daniel Bristot de Oliveira Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On Tue, May 28, 2019 at 05:16:24PM +0200, Daniel Bristot de Oliveira wrote: > #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \ > defined(CONFIG_TRACE_PREEMPT_TOGGLE)) > + > +DEFINE_PER_CPU(int, __traced_preempt_count) = 0; > /* > * If the value passed in is equal to the current preempt count > * then we just disabled preemption. Start timing the latency. > */ > void preempt_latency_start(int val) > { > - if (preempt_count() == val) { > + int curr = this_cpu_read(__traced_preempt_count); We actually have this_cpu_add_return(); > + > + if (!curr) { > unsigned long ip = get_lock_parent_ip(); > #ifdef CONFIG_DEBUG_PREEMPT > current->preempt_disable_ip = ip; > #endif > trace_preempt_off(CALLER_ADDR0, ip); > } > + > + this_cpu_write(__traced_preempt_count, curr + val); > } > > static inline void preempt_add_start_latency(int val) > @@ -3200,8 +3206,12 @@ NOKPROBE_SYMBOL(preempt_count_add); > */ > void preempt_latency_stop(int val) > { > - if (preempt_count() == val) > + int curr = this_cpu_read(__traced_preempt_count) - val; this_cpu_sub_return(); > + > + if (!curr) > trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip()); > + > + this_cpu_write(__traced_preempt_count, curr); > } Can't say I love this, but it is miles better than the last patch. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls 2019-05-29 8:41 ` Peter Zijlstra @ 2019-05-29 9:48 ` Daniel Bristot de Oliveira 0 siblings, 0 replies; 27+ messages in thread From: Daniel Bristot de Oliveira @ 2019-05-29 9:48 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware), Ingo Molnar, Thomas Gleixner, Paul E. McKenney, Matthias Kaehlcke, Joel Fernandes (Google), Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta On 29/05/2019 10:41, Peter Zijlstra wrote: > On Tue, May 28, 2019 at 05:16:24PM +0200, Daniel Bristot de Oliveira wrote: >> #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \ >> defined(CONFIG_TRACE_PREEMPT_TOGGLE)) >> + >> +DEFINE_PER_CPU(int, __traced_preempt_count) = 0; >> /* >> * If the value passed in is equal to the current preempt count >> * then we just disabled preemption. Start timing the latency. >> */ >> void preempt_latency_start(int val) >> { >> - if (preempt_count() == val) { >> + int curr = this_cpu_read(__traced_preempt_count); > > We actually have this_cpu_add_return(); > >> + >> + if (!curr) { >> unsigned long ip = get_lock_parent_ip(); >> #ifdef CONFIG_DEBUG_PREEMPT >> current->preempt_disable_ip = ip; >> #endif >> trace_preempt_off(CALLER_ADDR0, ip); >> } >> + >> + this_cpu_write(__traced_preempt_count, curr + val); >> } >> >> static inline void preempt_add_start_latency(int val) >> @@ -3200,8 +3206,12 @@ NOKPROBE_SYMBOL(preempt_count_add); >> */ >> void preempt_latency_stop(int val) >> { >> - if (preempt_count() == val) >> + int curr = this_cpu_read(__traced_preempt_count) - val; > > this_cpu_sub_return(); > >> + >> + if (!curr) >> trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip()); >> + >> + this_cpu_write(__traced_preempt_count, curr); >> } > > Can't say I love this, but it is miles better than the last patch. > ack! I will change the methods (and remove some blank lines here and there... :-)) Thanks Peter! -- Daniel ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2019-06-05 15:16 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-28 15:16 [RFC 0/3] preempt_tracer: Fix preempt_disable tracepoint Daniel Bristot de Oliveira 2019-05-28 15:16 ` [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption Daniel Bristot de Oliveira 2019-05-29 8:29 ` Peter Zijlstra 2019-05-29 9:30 ` Joel Fernandes 2019-05-29 12:22 ` Steven Rostedt 2019-06-04 10:39 ` Daniel Bristot de Oliveira 2019-06-04 12:01 ` Steven Rostedt 2019-05-28 15:16 ` [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change Daniel Bristot de Oliveira 2019-05-29 8:33 ` Peter Zijlstra 2019-05-29 9:40 ` Daniel Bristot de Oliveira 2019-05-29 10:20 ` Peter Zijlstra 2019-05-29 12:39 ` Steven Rostedt 2019-05-29 13:19 ` Peter Zijlstra 2019-05-29 13:29 ` Peter Zijlstra 2019-05-29 13:42 ` Steven Rostedt 2019-05-29 13:49 ` Peter Zijlstra 2019-05-29 13:58 ` Steven Rostedt 2019-05-29 13:51 ` Daniel Bristot de Oliveira 2019-05-29 18:21 ` Peter Zijlstra 2019-06-04 10:20 ` Daniel Bristot de Oliveira 2019-05-31 7:47 ` Joel Fernandes 2019-06-04 10:12 ` Daniel Bristot de Oliveira 2019-06-04 15:01 ` Steven Rostedt 2019-06-05 15:16 ` Joel Fernandes 2019-05-28 15:16 ` [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls Daniel Bristot de Oliveira 2019-05-29 8:41 ` Peter Zijlstra 2019-05-29 9:48 ` Daniel Bristot de Oliveira
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).