LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC 0/3] preempt_tracer: Fix preempt_disable tracepoint
@ 2019-05-28 15:16 Daniel Bristot de Oliveira
  2019-05-28 15:16 ` [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption Daniel Bristot de Oliveira
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-05-28 15:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: williams, daniel, Steven Rostedt (VMware),
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

While playing with the model + working in a fix for the task
context & trace recursion, I ended up hitting two cases in which the
preempt_disable/enable tracepoint was supposed to happen, but didn't.

There is an explanation for each case in the log message.

This is an RFC exposing the problem, with possible ways to fix it.
But I bet there might be better solutions as well, so this is a real
RFC.

Daniel Bristot de Oliveira (3):
  softirq: Use preempt_latency_stop/start to trace preemption
  preempt_tracer: Disable IRQ while starting/stopping due to a
    preempt_counter change
  preempt_tracer: Use a percpu variable to control traceble calls

 kernel/sched/core.c | 66 ++++++++++++++++++++++++++++++++++-----------
 kernel/softirq.c    | 13 ++++-----
 2 files changed, 55 insertions(+), 24 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption
  2019-05-28 15:16 [RFC 0/3] preempt_tracer: Fix preempt_disable tracepoint Daniel Bristot de Oliveira
@ 2019-05-28 15:16 ` Daniel Bristot de Oliveira
  2019-05-29  8:29   ` Peter Zijlstra
  2019-05-29  9:30   ` Joel Fernandes
  2019-05-28 15:16 ` [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change Daniel Bristot de Oliveira
  2019-05-28 15:16 ` [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls Daniel Bristot de Oliveira
  2 siblings, 2 replies; 27+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-05-28 15:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: williams, daniel, Steven Rostedt (VMware),
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

prempt_disable/enable tracepoints occurs only in the preemption
enabled <-> disable transition. As preempt_latency_stop() and
preempt_latency_start() already do this control, avoid code
duplication by using these functions in the softirq code as well.

RFC: Should we move preempt_latency_start/preempt_latency_stop
to a trace source file... or even a header?

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Yangtao Li <tiny.windzz@gmail.com>
Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>
Cc: linux-kernel@vger.kernel.org
---
 kernel/sched/core.c |  4 ++--
 kernel/softirq.c    | 13 +++++--------
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 874c427742a9..8c0b414e45dc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3152,7 +3152,7 @@ static inline void sched_tick_stop(int cpu) { }
  * If the value passed in is equal to the current preempt count
  * then we just disabled preemption. Start timing the latency.
  */
-static inline void preempt_latency_start(int val)
+void preempt_latency_start(int val)
 {
 	if (preempt_count() == val) {
 		unsigned long ip = get_lock_parent_ip();
@@ -3189,7 +3189,7 @@ NOKPROBE_SYMBOL(preempt_count_add);
  * If the value passed in equals to the current preempt count
  * then we just enabled preemption. Stop timing the latency.
  */
-static inline void preempt_latency_stop(int val)
+void preempt_latency_stop(int val)
 {
 	if (preempt_count() == val)
 		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 2c3382378d94..c9ad89c3dfed 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -108,6 +108,8 @@ static bool ksoftirqd_running(unsigned long pending)
  * where hardirqs are disabled legitimately:
  */
 #ifdef CONFIG_TRACE_IRQFLAGS
+extern void preempt_latency_start(int val);
+extern void preempt_latency_stop(int val);
 void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
 {
 	unsigned long flags;
@@ -130,12 +132,8 @@ void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
 		trace_softirqs_off(ip);
 	raw_local_irq_restore(flags);
 
-	if (preempt_count() == cnt) {
-#ifdef CONFIG_DEBUG_PREEMPT
-		current->preempt_disable_ip = get_lock_parent_ip();
-#endif
-		trace_preempt_off(CALLER_ADDR0, get_lock_parent_ip());
-	}
+	preempt_latency_start(cnt);
+
 }
 EXPORT_SYMBOL(__local_bh_disable_ip);
 #endif /* CONFIG_TRACE_IRQFLAGS */
@@ -144,8 +142,7 @@ static void __local_bh_enable(unsigned int cnt)
 {
 	lockdep_assert_irqs_disabled();
 
-	if (preempt_count() == cnt)
-		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
+	preempt_latency_stop(cnt);
 
 	if (softirq_count() == (cnt & SOFTIRQ_MASK))
 		trace_softirqs_on(_RET_IP_);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-28 15:16 [RFC 0/3] preempt_tracer: Fix preempt_disable tracepoint Daniel Bristot de Oliveira
  2019-05-28 15:16 ` [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption Daniel Bristot de Oliveira
@ 2019-05-28 15:16 ` Daniel Bristot de Oliveira
  2019-05-29  8:33   ` Peter Zijlstra
  2019-05-28 15:16 ` [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls Daniel Bristot de Oliveira
  2 siblings, 1 reply; 27+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-05-28 15:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: williams, daniel, Steven Rostedt (VMware),
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

The preempt_disable/enable tracepoint only traces in the disable <-> enable
case, which is correct. But think about this case:

---------------------------- %< ------------------------------
	THREAD					IRQ
	   |					 |
preempt_disable() {
    __preempt_count_add(1)
	------->	    smp_apic_timer_interrupt() {
				preempt_disable()
				    do not trace (preempt count >= 1)
				    ....
				preempt_enable()
				    do not trace (preempt count >= 1)
			    }
    trace_preempt_disable();
}
---------------------------- >% ------------------------------

The tracepoint will be skipped.

It is possible to observe this problem using this kernel module:

	http://bristot.me/files/efficient_verification/wip.tar.gz [*]

and doing the following trace:

	# cd /sys/kernel/debug/tracing/
	# echo 1 > snapshot
	# cat available_events | grep preempt_ > set_event
	# echo irq_vectors >> /sys/kernel/debug/tracing/set_event
	# insmod wip.ko
	/* wait for a snapshot creation */
	# tail -100 snapshot
		[...]
	    tail-5572  [001] ....1..  2888.401184: preempt_enable: caller=_raw_spin_unlock_irqrestore+0x2a/0x70 parent=          (null)
	    tail-5572  [001] ....1..  2888.401184: preempt_disable: caller=migrate_disable+0x8b/0x1e0 parent=migrate_disable+0x8b/0x1e0
	    tail-5572  [001] ....111  2888.401184: preempt_enable: caller=migrate_disable+0x12f/0x1e0 parent=migrate_disable+0x12f/0x1e0
	    tail-5572  [001] d..h212  2888.401189: local_timer_entry: vector=236
	    tail-5572  [001] dN.h512  2888.401192: process_event: event sched_waking not expected in the state preemptive
	    tail-5572  [001] dN.h512  2888.401200: <stack trace>
	 => process_event
	 => __handle_event
	 => ttwu_do_wakeup
	 => try_to_wake_up
	 => invoke_rcu_core
	 => rcu_check_callbacks
	 => update_process_times
	 => tick_sched_handle
	 => tick_sched_timer
	 => __hrtimer_run_queues
	 => hrtimer_interrupt
	 => smp_apic_timer_interrupt
	 => apic_timer_interrupt
	 => trace_event_raw_event_preemptirq_template
	 => trace_preempt_off
	 => get_page_from_freelist
	 => __alloc_pages_nodemask
	 => __handle_mm_fault
	 => handle_mm_fault
	 => __do_page_fault
	 => do_page_fault
	 => async_page_fault

To avoid skipping the trace, the change in the counter should be "atomic"
with the start/stop, w.r.t the interrupts.

Disable interrupts while the adding/starting stopping/subtracting.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Yangtao Li <tiny.windzz@gmail.com>
Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>
Cc: linux-kernel@vger.kernel.org
---
[*] with some luck we will talk about this at the Plumbers.

 kernel/sched/core.c | 48 +++++++++++++++++++++++++++++++++------------
 kernel/softirq.c    |  2 +-
 2 files changed, 37 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8c0b414e45dc..be4117d7384f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3163,6 +3163,16 @@ void preempt_latency_start(int val)
 	}
 }
 
+static inline void preempt_add_start_latency(int val)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	__preempt_count_add(val);
+	preempt_latency_start(val);
+	raw_local_irq_restore(flags);
+}
+
 void preempt_count_add(int val)
 {
 #ifdef CONFIG_DEBUG_PREEMPT
@@ -3172,7 +3182,7 @@ void preempt_count_add(int val)
 	if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))
 		return;
 #endif
-	__preempt_count_add(val);
+	preempt_add_start_latency(val);
 #ifdef CONFIG_DEBUG_PREEMPT
 	/*
 	 * Spinlock count overflowing soon?
@@ -3180,7 +3190,6 @@ void preempt_count_add(int val)
 	DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >=
 				PREEMPT_MASK - 10);
 #endif
-	preempt_latency_start(val);
 }
 EXPORT_SYMBOL(preempt_count_add);
 NOKPROBE_SYMBOL(preempt_count_add);
@@ -3195,6 +3204,16 @@ void preempt_latency_stop(int val)
 		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
 }
 
+static inline void preempt_sub_stop_latency(int val)
+{
+	unsigned long flags;
+
+	raw_local_irq_save(flags);
+	preempt_latency_stop(val);
+	__preempt_count_sub(val);
+	raw_local_irq_restore(flags);
+}
+
 void preempt_count_sub(int val)
 {
 #ifdef CONFIG_DEBUG_PREEMPT
@@ -3211,8 +3230,7 @@ void preempt_count_sub(int val)
 		return;
 #endif
 
-	preempt_latency_stop(val);
-	__preempt_count_sub(val);
+	preempt_sub_stop_latency(val);
 }
 EXPORT_SYMBOL(preempt_count_sub);
 NOKPROBE_SYMBOL(preempt_count_sub);
@@ -3220,6 +3238,16 @@ NOKPROBE_SYMBOL(preempt_count_sub);
 #else
 static inline void preempt_latency_start(int val) { }
 static inline void preempt_latency_stop(int val) { }
+
+static inline void preempt_sub_stop_latency(int val)
+{
+	__preempt_count_sub(val);
+}
+
+static inline void preempt_add_start_latency(int val)
+{
+	__preempt_count_add(val);
+}
 #endif
 
 static inline unsigned long get_preempt_disable_ip(struct task_struct *p)
@@ -3585,11 +3613,9 @@ static void __sched notrace preempt_schedule_common(void)
 		 * traced. The other to still record the preemption latency,
 		 * which can also be traced by the function tracer.
 		 */
-		preempt_disable_notrace();
-		preempt_latency_start(1);
+		preempt_add_start_latency(1);
 		__schedule(true);
-		preempt_latency_stop(1);
-		preempt_enable_no_resched_notrace();
+		preempt_sub_stop_latency(1);
 
 		/*
 		 * Check again in case we missed a preemption opportunity
@@ -3653,8 +3679,7 @@ asmlinkage __visible void __sched notrace preempt_schedule_notrace(void)
 		 * traced. The other to still record the preemption latency,
 		 * which can also be traced by the function tracer.
 		 */
-		preempt_disable_notrace();
-		preempt_latency_start(1);
+		preempt_add_start_latency(1);
 		/*
 		 * Needs preempt disabled in case user_exit() is traced
 		 * and the tracer calls preempt_enable_notrace() causing
@@ -3664,8 +3689,7 @@ asmlinkage __visible void __sched notrace preempt_schedule_notrace(void)
 		__schedule(true);
 		exception_exit(prev_ctx);
 
-		preempt_latency_stop(1);
-		preempt_enable_no_resched_notrace();
+		preempt_sub_stop_latency(1);
 	} while (need_resched());
 }
 EXPORT_SYMBOL_GPL(preempt_schedule_notrace);
diff --git a/kernel/softirq.c b/kernel/softirq.c
index c9ad89c3dfed..9c64522ecc76 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -130,9 +130,9 @@ void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
 	 */
 	if (softirq_count() == (cnt & SOFTIRQ_MASK))
 		trace_softirqs_off(ip);
-	raw_local_irq_restore(flags);
 
 	preempt_latency_start(cnt);
+	raw_local_irq_restore(flags);
 
 }
 EXPORT_SYMBOL(__local_bh_disable_ip);
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls
  2019-05-28 15:16 [RFC 0/3] preempt_tracer: Fix preempt_disable tracepoint Daniel Bristot de Oliveira
  2019-05-28 15:16 ` [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption Daniel Bristot de Oliveira
  2019-05-28 15:16 ` [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change Daniel Bristot de Oliveira
@ 2019-05-28 15:16 ` Daniel Bristot de Oliveira
  2019-05-29  8:41   ` Peter Zijlstra
  2 siblings, 1 reply; 27+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-05-28 15:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: williams, daniel, Steven Rostedt (VMware),
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

The preempt_disable tracepoint only traces in the disable <-> enable case.
Which is correct, but think about this case:

--------------------------- %< ----------------------
	 THREAD					IRQ
	   |					 |
preempt_disable_notrace() {
    __preempt_count_add(1)
}

/* preemption disabled/IRQs enabled */

	------->		smp_apic_timer_interrupt() {
				    preempt_disable() {
					do not trace (preempt count >= 1)
				    }
				    ....
				    preempt_enable() {
						do not trace (preempt count >= 1)
				    }
	<-------		}
preempt_enable_notrace() {
    __preempt_count_sub(1)
}
--------------------------- >% ----------------------

The non-traceble preempt_disable can hide a legit preempt_disable (which
is worth tracing).

It is possible to observe this problem using this kernel module:

    http://bristot.me/files/efficient_verification/wip.tar.gz

and doing the following trace:

	# cd /sys/kernel/debug/tracing/
	# echo 1 > snapshot
	# cat available_events | grep preempt_ > set_event
	# echo irq_vectors >> /sys/kernel/debug/tracing/set_event
	# insmod wip.ko
	/* wait for a snapshot creation */
	# tail -100 snapshot
	    sshd-1159  [000] d...1..  2440.866116: preempt_enable: caller=migrate_enable+0x1bb/0x330 parent=migrate_enable+0x1bb/0x330
	    sshd-1159  [000] d..h1..  2440.866122: local_timer_entry: vector=236
	    sshd-1159  [000] d..h1..  2440.866127: local_timer_exit: vector=236
	    sshd-1159  [000] d.L.4..  2440.866129: process_event: event sched_waking not expected in the state preemptive
	    sshd-1159  [000] d.L.4..  2440.866137: <stack trace>
	 => process_event
	 => __handle_event
	 => ttwu_do_wakeup
	 => try_to_wake_up
	 => irq_exit
	 => smp_apic_timer_interrupt
	 => apic_timer_interrupt
	 => kvm_clock_read
	 => ktime_get_with_offset
	 => posix_get_boottime
	 => __x64_sys_clock_gettime
	 => do_syscall_64
	 => entry_SYSCALL_64_after_hwframe

and kvm_clock_read() disables preemption without tracing:

--------------------------- %< ----------------------
static u64 kvm_clock_read(void)
{
	u64 ret;

	preempt_disable_notrace();
	ret = pvclock_clocksource_read(this_cpu_pvti());
	preempt_enable_notrace();
	return ret;
}
--------------------------- >% ----------------------

Use a percpu variable for the traced preempt_disable/enable, and use it
to decide whether trace or not.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Yangtao Li <tiny.windzz@gmail.com>
Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>
Cc: linux-kernel@vger.kernel.org
---
 kernel/sched/core.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index be4117d7384f..2e07d4174778 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3148,19 +3148,25 @@ static inline void sched_tick_stop(int cpu) { }
 
 #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
 				defined(CONFIG_TRACE_PREEMPT_TOGGLE))
+
+DEFINE_PER_CPU(int, __traced_preempt_count) = 0;
 /*
  * If the value passed in is equal to the current preempt count
  * then we just disabled preemption. Start timing the latency.
  */
 void preempt_latency_start(int val)
 {
-	if (preempt_count() == val) {
+	int curr = this_cpu_read(__traced_preempt_count);
+
+	if (!curr) {
 		unsigned long ip = get_lock_parent_ip();
 #ifdef CONFIG_DEBUG_PREEMPT
 		current->preempt_disable_ip = ip;
 #endif
 		trace_preempt_off(CALLER_ADDR0, ip);
 	}
+
+	this_cpu_write(__traced_preempt_count, curr + val);
 }
 
 static inline void preempt_add_start_latency(int val)
@@ -3200,8 +3206,12 @@ NOKPROBE_SYMBOL(preempt_count_add);
  */
 void preempt_latency_stop(int val)
 {
-	if (preempt_count() == val)
+	int curr = this_cpu_read(__traced_preempt_count) - val;
+
+	if (!curr)
 		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
+
+	this_cpu_write(__traced_preempt_count, curr);
 }
 
 static inline void preempt_sub_stop_latency(int val)
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption
  2019-05-28 15:16 ` [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption Daniel Bristot de Oliveira
@ 2019-05-29  8:29   ` Peter Zijlstra
  2019-05-29  9:30   ` Joel Fernandes
  1 sibling, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2019-05-29  8:29 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware),
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On Tue, May 28, 2019 at 05:16:22PM +0200, Daniel Bristot de Oliveira wrote:
> prempt_disable/enable tracepoints occurs only in the preemption
> enabled <-> disable transition. As preempt_latency_stop() and
> preempt_latency_start() already do this control, avoid code
> duplication by using these functions in the softirq code as well.
> 
> RFC: Should we move preempt_latency_start/preempt_latency_stop
> to a trace source file... or even a header?

Yeah, a header might not be a bad idea.

> @@ -130,12 +132,8 @@ void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
>  		trace_softirqs_off(ip);
>  	raw_local_irq_restore(flags);
>  
> -	if (preempt_count() == cnt) {
> -#ifdef CONFIG_DEBUG_PREEMPT
> -		current->preempt_disable_ip = get_lock_parent_ip();
> -#endif
> -		trace_preempt_off(CALLER_ADDR0, get_lock_parent_ip());
> -	}
> +	preempt_latency_start(cnt);
> +

I'm thinking we can do without that empty line.

>  }
>  EXPORT_SYMBOL(__local_bh_disable_ip);
>  #endif /* CONFIG_TRACE_IRQFLAGS */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-28 15:16 ` [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change Daniel Bristot de Oliveira
@ 2019-05-29  8:33   ` Peter Zijlstra
  2019-05-29  9:40     ` Daniel Bristot de Oliveira
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2019-05-29  8:33 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware),
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote:
> The preempt_disable/enable tracepoint only traces in the disable <-> enable
> case, which is correct. But think about this case:
> 
> ---------------------------- %< ------------------------------
> 	THREAD					IRQ
> 	   |					 |
> preempt_disable() {
>     __preempt_count_add(1)
> 	------->	    smp_apic_timer_interrupt() {
> 				preempt_disable()
> 				    do not trace (preempt count >= 1)
> 				    ....
> 				preempt_enable()
> 				    do not trace (preempt count >= 1)
> 			    }
>     trace_preempt_disable();
> }
> ---------------------------- >% ------------------------------
> 
> The tracepoint will be skipped.

.... for the IRQ. But IRQs are not preemptible anyway, so what the
problem?

> To avoid skipping the trace, the change in the counter should be "atomic"
> with the start/stop, w.r.t the interrupts.
> 
> Disable interrupts while the adding/starting stopping/subtracting.

> +static inline void preempt_add_start_latency(int val)
> +{
> +	unsigned long flags;
> +
> +	raw_local_irq_save(flags);
> +	__preempt_count_add(val);
> +	preempt_latency_start(val);
> +	raw_local_irq_restore(flags);
> +}

> +static inline void preempt_sub_stop_latency(int val)
> +{
> +	unsigned long flags;
> +
> +	raw_local_irq_save(flags);
> +	preempt_latency_stop(val);
> +	__preempt_count_sub(val);
> +	raw_local_irq_restore(flags);
> +}

That is hideously expensive :/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls
  2019-05-28 15:16 ` [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls Daniel Bristot de Oliveira
@ 2019-05-29  8:41   ` Peter Zijlstra
  2019-05-29  9:48     ` Daniel Bristot de Oliveira
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2019-05-29  8:41 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware),
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On Tue, May 28, 2019 at 05:16:24PM +0200, Daniel Bristot de Oliveira wrote:
>  #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
>  				defined(CONFIG_TRACE_PREEMPT_TOGGLE))
> +
> +DEFINE_PER_CPU(int, __traced_preempt_count) = 0;
>  /*
>   * If the value passed in is equal to the current preempt count
>   * then we just disabled preemption. Start timing the latency.
>   */
>  void preempt_latency_start(int val)
>  {
> -	if (preempt_count() == val) {
> +	int curr = this_cpu_read(__traced_preempt_count);

We actually have this_cpu_add_return();

> +
> +	if (!curr) {
>  		unsigned long ip = get_lock_parent_ip();
>  #ifdef CONFIG_DEBUG_PREEMPT
>  		current->preempt_disable_ip = ip;
>  #endif
>  		trace_preempt_off(CALLER_ADDR0, ip);
>  	}
> +
> +	this_cpu_write(__traced_preempt_count, curr + val);
>  }
>  
>  static inline void preempt_add_start_latency(int val)
> @@ -3200,8 +3206,12 @@ NOKPROBE_SYMBOL(preempt_count_add);
>   */
>  void preempt_latency_stop(int val)
>  {
> -	if (preempt_count() == val)
> +	int curr = this_cpu_read(__traced_preempt_count) - val;

this_cpu_sub_return();

> +
> +	if (!curr)
>  		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
> +
> +	this_cpu_write(__traced_preempt_count, curr);
>  }

Can't say I love this, but it is miles better than the last patch.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption
  2019-05-28 15:16 ` [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption Daniel Bristot de Oliveira
  2019-05-29  8:29   ` Peter Zijlstra
@ 2019-05-29  9:30   ` Joel Fernandes
  2019-05-29 12:22     ` Steven Rostedt
  1 sibling, 1 reply; 27+ messages in thread
From: Joel Fernandes @ 2019-05-29  9:30 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware),
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li,
	Tommaso Cucinotta

On Tue, May 28, 2019 at 05:16:22PM +0200, Daniel Bristot de Oliveira wrote:
> prempt_disable/enable tracepoints occurs only in the preemption
> enabled <-> disable transition. As preempt_latency_stop() and
> preempt_latency_start() already do this control, avoid code
> duplication by using these functions in the softirq code as well.
> 
> RFC: Should we move preempt_latency_start/preempt_latency_stop
> to a trace source file... or even a header?

Yes, I think so. Also this patch changes CALLER_ADDR0 passed to the
tracepoint because there's one more level of a non-inlined function call
in the call chain right?  Very least the changelog should document this
change in functional behavior, IMO.

Looks like a nice change otherwise, thanks!


> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: Matthias Kaehlcke <mka@chromium.org>
> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Yangtao Li <tiny.windzz@gmail.com>
> Cc: Tommaso Cucinotta <tommaso.cucinotta@santannapisa.it>
> Cc: linux-kernel@vger.kernel.org
> ---
>  kernel/sched/core.c |  4 ++--
>  kernel/softirq.c    | 13 +++++--------
>  2 files changed, 7 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 874c427742a9..8c0b414e45dc 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3152,7 +3152,7 @@ static inline void sched_tick_stop(int cpu) { }
>   * If the value passed in is equal to the current preempt count
>   * then we just disabled preemption. Start timing the latency.
>   */
> -static inline void preempt_latency_start(int val)
> +void preempt_latency_start(int val)
>  {
>  	if (preempt_count() == val) {
>  		unsigned long ip = get_lock_parent_ip();
> @@ -3189,7 +3189,7 @@ NOKPROBE_SYMBOL(preempt_count_add);
>   * If the value passed in equals to the current preempt count
>   * then we just enabled preemption. Stop timing the latency.
>   */
> -static inline void preempt_latency_stop(int val)
> +void preempt_latency_stop(int val)
>  {
>  	if (preempt_count() == val)
>  		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 2c3382378d94..c9ad89c3dfed 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -108,6 +108,8 @@ static bool ksoftirqd_running(unsigned long pending)
>   * where hardirqs are disabled legitimately:
>   */
>  #ifdef CONFIG_TRACE_IRQFLAGS
> +extern void preempt_latency_start(int val);
> +extern void preempt_latency_stop(int val);
>  void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
>  {
>  	unsigned long flags;
> @@ -130,12 +132,8 @@ void __local_bh_disable_ip(unsigned long ip, unsigned int cnt)
>  		trace_softirqs_off(ip);
>  	raw_local_irq_restore(flags);
>  
> -	if (preempt_count() == cnt) {
> -#ifdef CONFIG_DEBUG_PREEMPT
> -		current->preempt_disable_ip = get_lock_parent_ip();
> -#endif
> -		trace_preempt_off(CALLER_ADDR0, get_lock_parent_ip());
> -	}
> +	preempt_latency_start(cnt);
> +
>  }
>  EXPORT_SYMBOL(__local_bh_disable_ip);
>  #endif /* CONFIG_TRACE_IRQFLAGS */
> @@ -144,8 +142,7 @@ static void __local_bh_enable(unsigned int cnt)
>  {
>  	lockdep_assert_irqs_disabled();
>  
> -	if (preempt_count() == cnt)
> -		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
> +	preempt_latency_stop(cnt);
>  
>  	if (softirq_count() == (cnt & SOFTIRQ_MASK))
>  		trace_softirqs_on(_RET_IP_);
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-29  8:33   ` Peter Zijlstra
@ 2019-05-29  9:40     ` Daniel Bristot de Oliveira
  2019-05-29 10:20       ` Peter Zijlstra
  2019-05-31  7:47       ` Joel Fernandes
  0 siblings, 2 replies; 27+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-05-29  9:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware),
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On 29/05/2019 10:33, Peter Zijlstra wrote:
> On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote:
>> The preempt_disable/enable tracepoint only traces in the disable <-> enable
>> case, which is correct. But think about this case:
>>
>> ---------------------------- %< ------------------------------
>> 	THREAD					IRQ
>> 	   |					 |
>> preempt_disable() {
>>     __preempt_count_add(1)
>> 	------->	    smp_apic_timer_interrupt() {
>> 				preempt_disable()
>> 				    do not trace (preempt count >= 1)
>> 				    ....
>> 				preempt_enable()
>> 				    do not trace (preempt count >= 1)
>> 			    }
>>     trace_preempt_disable();
>> }
>> ---------------------------- >% ------------------------------
>>
>> The tracepoint will be skipped.
> 
> .... for the IRQ. But IRQs are not preemptible anyway, so what the
> problem?


right, they are.

exposing my problem in a more specific way:

To show in a model that an event always takes place with preemption disabled,
but not necessarily with IRQs disabled, it is worth having the preemption
disable events separated from IRQ disable ones.

The main reason is that, although IRQs disabled postpone the execution of the
scheduler, it is more pessimistic, as it also delays IRQs. So the more precise
the model is, the less pessimistic the analysis will be.

But there are other use-cases, for instance:

(Steve, correct me if I am wrong)

The preempt_tracer will not notice a "preempt disabled" section in an IRQ
handler if the problem above happens.

(Yeah, I know these problems are very specific... but...)

>> To avoid skipping the trace, the change in the counter should be "atomic"
>> with the start/stop, w.r.t the interrupts.
>>
>> Disable interrupts while the adding/starting stopping/subtracting.
> 
>> +static inline void preempt_add_start_latency(int val)
>> +{
>> +	unsigned long flags;
>> +
>> +	raw_local_irq_save(flags);
>> +	__preempt_count_add(val);
>> +	preempt_latency_start(val);
>> +	raw_local_irq_restore(flags);
>> +}
> 
>> +static inline void preempt_sub_stop_latency(int val)
>> +{
>> +	unsigned long flags;
>> +
>> +	raw_local_irq_save(flags);
>> +	preempt_latency_stop(val);
>> +	__preempt_count_sub(val);
>> +	raw_local_irq_restore(flags);
>> +}
> 
> That is hideously expensive :/

Yeah... :-( Is there another way to provide such "atomicity"?

Can I use the argument "if one has these tracepoints enabled, they are not
considering it as a hot-path?"

-- Daniel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls
  2019-05-29  8:41   ` Peter Zijlstra
@ 2019-05-29  9:48     ` Daniel Bristot de Oliveira
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-05-29  9:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware),
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On 29/05/2019 10:41, Peter Zijlstra wrote:
> On Tue, May 28, 2019 at 05:16:24PM +0200, Daniel Bristot de Oliveira wrote:
>>  #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
>>  				defined(CONFIG_TRACE_PREEMPT_TOGGLE))
>> +
>> +DEFINE_PER_CPU(int, __traced_preempt_count) = 0;
>>  /*
>>   * If the value passed in is equal to the current preempt count
>>   * then we just disabled preemption. Start timing the latency.
>>   */
>>  void preempt_latency_start(int val)
>>  {
>> -	if (preempt_count() == val) {
>> +	int curr = this_cpu_read(__traced_preempt_count);
> 
> We actually have this_cpu_add_return();
> 
>> +
>> +	if (!curr) {
>>  		unsigned long ip = get_lock_parent_ip();
>>  #ifdef CONFIG_DEBUG_PREEMPT
>>  		current->preempt_disable_ip = ip;
>>  #endif
>>  		trace_preempt_off(CALLER_ADDR0, ip);
>>  	}
>> +
>> +	this_cpu_write(__traced_preempt_count, curr + val);
>>  }
>>  
>>  static inline void preempt_add_start_latency(int val)
>> @@ -3200,8 +3206,12 @@ NOKPROBE_SYMBOL(preempt_count_add);
>>   */
>>  void preempt_latency_stop(int val)
>>  {
>> -	if (preempt_count() == val)
>> +	int curr = this_cpu_read(__traced_preempt_count) - val;
> 
> this_cpu_sub_return();
> 
>> +
>> +	if (!curr)
>>  		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
>> +
>> +	this_cpu_write(__traced_preempt_count, curr);
>>  }
> 
> Can't say I love this, but it is miles better than the last patch.
> 

ack! I will change the methods (and remove some blank lines here and there... :-))

Thanks Peter!

-- Daniel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-29  9:40     ` Daniel Bristot de Oliveira
@ 2019-05-29 10:20       ` Peter Zijlstra
  2019-05-29 12:39         ` Steven Rostedt
  2019-05-29 13:51         ` Daniel Bristot de Oliveira
  2019-05-31  7:47       ` Joel Fernandes
  1 sibling, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2019-05-29 10:20 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware),
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On Wed, May 29, 2019 at 11:40:34AM +0200, Daniel Bristot de Oliveira wrote:
> On 29/05/2019 10:33, Peter Zijlstra wrote:
> > On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote:
> >> The preempt_disable/enable tracepoint only traces in the disable <-> enable
> >> case, which is correct. But think about this case:
> >>
> >> ---------------------------- %< ------------------------------
> >> 	THREAD					IRQ
> >> 	   |					 |
> >> preempt_disable() {
> >>     __preempt_count_add(1)
> >> 	------->	    smp_apic_timer_interrupt() {
> >> 				preempt_disable()
> >> 				    do not trace (preempt count >= 1)
> >> 				    ....
> >> 				preempt_enable()
> >> 				    do not trace (preempt count >= 1)
> >> 			    }
> >>     trace_preempt_disable();
> >> }
> >> ---------------------------- >% ------------------------------
> >>
> >> The tracepoint will be skipped.
> > 
> > .... for the IRQ. But IRQs are not preemptible anyway, so what the
> > problem?
> 
> 
> right, they are.
> 
> exposing my problem in a more specific way:
> 
> To show in a model that an event always takes place with preemption disabled,
> but not necessarily with IRQs disabled, it is worth having the preemption
> disable events separated from IRQ disable ones.
> 
> The main reason is that, although IRQs disabled postpone the execution of the
> scheduler, it is more pessimistic, as it also delays IRQs. So the more precise
> the model is, the less pessimistic the analysis will be.

I'm not sure I follow, IRQs disabled fully implies !preemptible. I don't
see how the model would be more pessimistic than reality if it were to
use this knowledge.

Any !0 preempt_count(), which very much includes (Hard)IRQ and SoftIRQ
counts, means non-preemptible.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption
  2019-05-29  9:30   ` Joel Fernandes
@ 2019-05-29 12:22     ` Steven Rostedt
  2019-06-04 10:39       ` Daniel Bristot de Oliveira
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2019-05-29 12:22 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Daniel Bristot de Oliveira, linux-kernel, williams, daniel,
	Ingo Molnar, Peter Zijlstra, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li,
	Tommaso Cucinotta

On Wed, 29 May 2019 05:30:56 -0400
Joel Fernandes <joel@joelfernandes.org> wrote:

> Yes, I think so. Also this patch changes CALLER_ADDR0 passed to the
> tracepoint because there's one more level of a non-inlined function call
> in the call chain right?  Very least the changelog should document this
> change in functional behavior, IMO.

This sounds more like a break in behavior not a functional change. I
guess moving it to a header and making it a static __always_inline
should be fine though.

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-29 10:20       ` Peter Zijlstra
@ 2019-05-29 12:39         ` Steven Rostedt
  2019-05-29 13:19           ` Peter Zijlstra
  2019-05-29 13:51         ` Daniel Bristot de Oliveira
  1 sibling, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2019-05-29 12:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Bristot de Oliveira, linux-kernel, williams, daniel,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On Wed, 29 May 2019 12:20:38 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, May 29, 2019 at 11:40:34AM +0200, Daniel Bristot de Oliveira wrote:
> > On 29/05/2019 10:33, Peter Zijlstra wrote:  
> > > On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote:  
> > >> The preempt_disable/enable tracepoint only traces in the disable <-> enable
> > >> case, which is correct. But think about this case:
> > >>
> > >> ---------------------------- %< ------------------------------
> > >> 	THREAD					IRQ
> > >> 	   |					 |
> > >> preempt_disable() {
> > >>     __preempt_count_add(1)  
> > >> 	------->	    smp_apic_timer_interrupt() {  
> > >> 				preempt_disable()
> > >> 				    do not trace (preempt count >= 1)
> > >> 				    ....
> > >> 				preempt_enable()
> > >> 				    do not trace (preempt count >= 1)
> > >> 			    }
> > >>     trace_preempt_disable();
> > >> }  
> > >> ---------------------------- >% ------------------------------  
> > >>
> > >> The tracepoint will be skipped.  
> > > 
> > > .... for the IRQ. But IRQs are not preemptible anyway, so what the
> > > problem?  
> > 
> > 
> > right, they are.
> > 
> > exposing my problem in a more specific way:
> > 
> > To show in a model that an event always takes place with preemption disabled,
> > but not necessarily with IRQs disabled, it is worth having the preemption
> > disable events separated from IRQ disable ones.
> > 
> > The main reason is that, although IRQs disabled postpone the execution of the
> > scheduler, it is more pessimistic, as it also delays IRQs. So the more precise
> > the model is, the less pessimistic the analysis will be.  
> 
> I'm not sure I follow, IRQs disabled fully implies !preemptible. I don't
> see how the model would be more pessimistic than reality if it were to
> use this knowledge.
> 
> Any !0 preempt_count(), which very much includes (Hard)IRQ and SoftIRQ
> counts, means non-preemptible.

I believe I see what Daniel is talking about, but I hate the proposed
solution ;-)

First, if you care about real times that the CPU can't preempt
(preempt_count != 0 or interrupts disabled), then you want the
preempt_irqsoff tracer. The preempt_tracer is more academic where it
just shows you when we disable preemption via the counter. But even
with the preempt_irqsoff tracer you may not get the full length of time
due to the above explained race.

	__preempt_count_add(1) <-- CPU now prevents preemption

	<interrupt>
		---->
			trace_hardirqs_off() <-- Start preempt disable timer
			handler();
			trace_hardirqs_on() <-- Stop preempt disable timer (Diff A)
		<----
	trace_preempt_disable() <-- Start preempt disable timer

	[..]

	trace_preempt_enable() <-- Stop preempt disable timer (Diff B)

	__preempt_count_sub(1) <-- CPU re-enables preemption

The preempt_irqsoff tracer will break this into two parts: Diff A and
Diff B, when in reality, the total time the CPU prevented preemption
was A + B.

Note, we can have the same race if an interrupt came in just after
calling trace_preempt_enable() and before the __preempt_count_sub().

What I would recommend is adding a flag to the task_struct that gets
set before the __preempt_count_add() and cleared by the tracing
function. If an interrupt goes off during this time, it will start the
total time to record, and not end it on the trace_hardirqs_on() part.
Now since we set this flag before disabling preemption, what if we get
preempted before calling __preempt_count_add()?. Simple, have a hook in
the scheduler (just connect to the sched_switch tracepoint) that checks
that flag, and if it is set, it ends the preempt disable recording
time. Also on scheduling that task back in, if that flag is set, start
the preempt disable timer.

It may get a bit more complex, but we can contain that complexity
within the tracing code, and we don't have to do added irq disabling.

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-29 12:39         ` Steven Rostedt
@ 2019-05-29 13:19           ` Peter Zijlstra
  2019-05-29 13:29             ` Peter Zijlstra
  2019-05-29 13:42             ` Steven Rostedt
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2019-05-29 13:19 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Daniel Bristot de Oliveira, linux-kernel, williams, daniel,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On Wed, May 29, 2019 at 08:39:30AM -0400, Steven Rostedt wrote:
> I believe I see what Daniel is talking about, but I hate the proposed
> solution ;-)
> 
> First, if you care about real times that the CPU can't preempt
> (preempt_count != 0 or interrupts disabled), then you want the
> preempt_irqsoff tracer. The preempt_tracer is more academic where it
> just shows you when we disable preemption via the counter. But even
> with the preempt_irqsoff tracer you may not get the full length of time
> due to the above explained race.

IOW, that tracer gives a completely 'make believe' number? What's the
point? Just delete the pure preempt tracer.

And the preempt_irqoff tracer had better also consume the IRQ events,
and if it does that it can DTRT without extra bits on, even with that
race.

Consider:

	preempt_disable()
	  preempt_count += 1;
	  <IRQ>
	    trace_irq_enter();

	    trace_irq_exit();
	  </IRQ>
	  trace_preempt_disable();

	/* does stuff */

	preempt_enable()
	  preempt_count -= 1;
	  trace_preempt_enable();

You're saying preempt_irqoff() fails to connect the two because of the
hole between trace_irq_exit() and trace_preempt_disable() ?

But trace_irq_exit() can see the raised preempt_count and set state for
trace_preempt_disable() to connect.

> What I would recommend is adding a flag to the task_struct that gets
> set before the __preempt_count_add() and cleared by the tracing
> function. If an interrupt goes off during this time, it will start the
> total time to record, and not end it on the trace_hardirqs_on() part.
> Now since we set this flag before disabling preemption, what if we get
> preempted before calling __preempt_count_add()?. Simple, have a hook in
> the scheduler (just connect to the sched_switch tracepoint) that checks
> that flag, and if it is set, it ends the preempt disable recording
> time. Also on scheduling that task back in, if that flag is set, start
> the preempt disable timer.

I don't think that works, you also have to consider softirq. And yes you
can make it more complicated, but I still don't see the point.

And none of this is relevant for Daniels model stuff. He just needs to
consider in-IRQ as !preempt.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-29 13:19           ` Peter Zijlstra
@ 2019-05-29 13:29             ` Peter Zijlstra
  2019-05-29 13:42             ` Steven Rostedt
  1 sibling, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2019-05-29 13:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Daniel Bristot de Oliveira, linux-kernel, williams, daniel,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On Wed, May 29, 2019 at 03:19:57PM +0200, Peter Zijlstra wrote:
> On Wed, May 29, 2019 at 08:39:30AM -0400, Steven Rostedt wrote:
> > I believe I see what Daniel is talking about, but I hate the proposed
> > solution ;-)
> > 
> > First, if you care about real times that the CPU can't preempt
> > (preempt_count != 0 or interrupts disabled), then you want the
> > preempt_irqsoff tracer. The preempt_tracer is more academic where it
> > just shows you when we disable preemption via the counter. But even
> > with the preempt_irqsoff tracer you may not get the full length of time
> > due to the above explained race.
> 
> IOW, that tracer gives a completely 'make believe' number? What's the
> point? Just delete the pure preempt tracer.

Alternatively, fix the preempt tracer by having it completely disregard
IRQs.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-29 13:19           ` Peter Zijlstra
  2019-05-29 13:29             ` Peter Zijlstra
@ 2019-05-29 13:42             ` Steven Rostedt
  2019-05-29 13:49               ` Peter Zijlstra
  1 sibling, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2019-05-29 13:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Bristot de Oliveira, linux-kernel, williams, daniel,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On Wed, 29 May 2019 15:19:57 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, May 29, 2019 at 08:39:30AM -0400, Steven Rostedt wrote:
> > I believe I see what Daniel is talking about, but I hate the proposed
> > solution ;-)
> > 
> > First, if you care about real times that the CPU can't preempt
> > (preempt_count != 0 or interrupts disabled), then you want the
> > preempt_irqsoff tracer. The preempt_tracer is more academic where it
> > just shows you when we disable preemption via the counter. But even
> > with the preempt_irqsoff tracer you may not get the full length of time
> > due to the above explained race.  
> 
> IOW, that tracer gives a completely 'make believe' number? What's the
> point? Just delete the pure preempt tracer.

The preempt_tracer is there as part of the preempt_irqsoff tracer
implementation. By removing it, the only code we would remove is
displaying preemptoff as a tracer. I stated this when it was created,
that it was more of an academic exercise if you use it, but that code
was required to get preempt_irqsoff working.

> 
> And the preempt_irqoff tracer had better also consume the IRQ events,
> and if it does that it can DTRT without extra bits on, even with that
> race.
> 
> Consider:
> 
> 	preempt_disable()
> 	  preempt_count += 1;
> 	  <IRQ>
> 	    trace_irq_enter();
> 
> 	    trace_irq_exit();
> 	  </IRQ>
> 	  trace_preempt_disable();
> 
> 	/* does stuff */
> 
> 	preempt_enable()
> 	  preempt_count -= 1;
> 	  trace_preempt_enable();
> 
> You're saying preempt_irqoff() fails to connect the two because of the
> hole between trace_irq_exit() and trace_preempt_disable() ?
> 
> But trace_irq_exit() can see the raised preempt_count and set state
> for trace_preempt_disable() to connect.

That's basically what I was suggesting as the solution to this ;-)

> 
> > What I would recommend is adding a flag to the task_struct that gets
> > set before the __preempt_count_add() and cleared by the tracing
> > function. If an interrupt goes off during this time, it will start
> > the total time to record, and not end it on the trace_hardirqs_on()
> > part. Now since we set this flag before disabling preemption, what
> > if we get preempted before calling __preempt_count_add()?. Simple,
> > have a hook in the scheduler (just connect to the sched_switch
> > tracepoint) that checks that flag, and if it is set, it ends the
> > preempt disable recording time. Also on scheduling that task back
> > in, if that flag is set, start the preempt disable timer.  
> 
> I don't think that works, you also have to consider softirq. And yes
> you can make it more complicated, but I still don't see the point.

Note, there's places that disable preemption without being traced. If
we trigger only on preemption being disabled and start the "timer",
there may not be any code to stop it. That was why I recommended the
flag in the code that starts the timing.

> 
> And none of this is relevant for Daniels model stuff. He just needs to
> consider in-IRQ as !preempt.

But he does bring up an issues that preempt_irqsoff fails.

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-29 13:42             ` Steven Rostedt
@ 2019-05-29 13:49               ` Peter Zijlstra
  2019-05-29 13:58                 ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2019-05-29 13:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Daniel Bristot de Oliveira, linux-kernel, williams, daniel,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On Wed, May 29, 2019 at 09:42:13AM -0400, Steven Rostedt wrote:
> > And the preempt_irqoff tracer had better also consume the IRQ events,
> > and if it does that it can DTRT without extra bits on, even with that
> > race.
> > 
> > Consider:
> > 
> > 	preempt_disable()
> > 	  preempt_count += 1;
> > 	  <IRQ>
> > 	    trace_irq_enter();
> > 
> > 	    trace_irq_exit();
> > 	  </IRQ>
> > 	  trace_preempt_disable();
> > 
> > 	/* does stuff */
> > 
> > 	preempt_enable()
> > 	  preempt_count -= 1;
> > 	  trace_preempt_enable();
> > 
> > You're saying preempt_irqoff() fails to connect the two because of the
> > hole between trace_irq_exit() and trace_preempt_disable() ?
> > 
> > But trace_irq_exit() can see the raised preempt_count and set state
> > for trace_preempt_disable() to connect.
> 
> That's basically what I was suggesting as the solution to this ;-)

You were wanting changes to preempt_disable() and task_struct, neither
of which is required. The above only needs some per-cpu storage in the
tracer implementation.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-29 10:20       ` Peter Zijlstra
  2019-05-29 12:39         ` Steven Rostedt
@ 2019-05-29 13:51         ` Daniel Bristot de Oliveira
  2019-05-29 18:21           ` Peter Zijlstra
  1 sibling, 1 reply; 27+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-05-29 13:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware),
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On 29/05/2019 12:20, Peter Zijlstra wrote:
> On Wed, May 29, 2019 at 11:40:34AM +0200, Daniel Bristot de Oliveira wrote:
>> On 29/05/2019 10:33, Peter Zijlstra wrote:
>>> On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote:
>>>> The preempt_disable/enable tracepoint only traces in the disable <-> enable
>>>> case, which is correct. But think about this case:
>>>>
>>>> ---------------------------- %< ------------------------------
>>>> 	THREAD					IRQ
>>>> 	   |					 |
>>>> preempt_disable() {
>>>>     __preempt_count_add(1)
>>>> 	------->	    smp_apic_timer_interrupt() {
>>>> 				preempt_disable()
>>>> 				    do not trace (preempt count >= 1)
>>>> 				    ....
>>>> 				preempt_enable()
>>>> 				    do not trace (preempt count >= 1)
>>>> 			    }
>>>>     trace_preempt_disable();
>>>> }
>>>> ---------------------------- >% ------------------------------
>>>>
>>>> The tracepoint will be skipped.
>>>
>>> .... for the IRQ. But IRQs are not preemptible anyway, so what the
>>> problem?
>>
>>
>> right, they are.
>>
>> exposing my problem in a more specific way:
>>
>> To show in a model that an event always takes place with preemption disabled,
>> but not necessarily with IRQs disabled, it is worth having the preemption
>> disable events separated from IRQ disable ones.
>>
>> The main reason is that, although IRQs disabled postpone the execution of the
>> scheduler, it is more pessimistic, as it also delays IRQs. So the more precise
>> the model is, the less pessimistic the analysis will be.
> 
> I'm not sure I follow, IRQs disabled fully implies !preemptible. I don't
> see how the model would be more pessimistic than reality if it were to
> use this knowledge.

Maybe I did not expressed myself well... and the example was not good either.

"IRQs disabled fully implies !preemptible" is a "to big" step. In modeling (or
mathematical reasoning?), a good practice is to break the properties into small
piece, and then build more complex reasoning/implications using these "small
properties."

Doing "big steps" makes you prone "miss interpretations", creating ambiguity.
Then, -RT people are prone to be pessimist, non-RT optimistic, and so on... and
that is what models try to avoid.

For instance, explaining this using words is contradictory:>
> Any !0 preempt_count(), which very much includes (Hard)IRQ and SoftIRQ
> counts, means non-preemptible.

One might argue that, the preemption of a thread always takes place with
preempt_count() != 0, because __schedule() is always called with preemption
disabled, so the preemption takes place while in non-preemptive.

A more elaborated example:

------------------ %< --------------------------
Thread A is running, and goes to sleep waiting for a timer...
	schedule() {
		preempt_disable();
		__schedule() {
			smp_apic_timer_interrupt() {
				sched_wakeup (Thread A);
				sched_wakeup (Thread B: highest prio) {
					sched_set_need_resched();
				}
			}
			local_irq_disable()
			context switch to B, A leaves in state=R.
------------------ %< --------------------------

In this case, the thread A suffered a "preemption" with "!0 preempt_count()"

The fact is, Linux does not fit straight in the "well known terminology" of
academic papers because many of those terminology bases in the fact that
operations are atomic. But they are not and Linux has some behaviors that
desires new terminology/interpretation...

- WAIT But you (daniel) wants to fake the atomicity between preempt_disable and
its tracepoint!

Yes, I do, but this is a very straightforward step/assumption: the atomicity is
about the real-event and the tracepoint that notifies it. It is not about two
different events.

That is why it is worth letting the modeling rules to clarify the behavior of
system, without doing non-obvious implication in the code part, so we can have a
model that fits better in the Linux actions/events to avoid ambiguity.

[ note 1: the tracepoint is only enabled if CONFIG_PREEMPTIRQ_TRACEPOINTS=y
 which is not enabled by default ]

[ note 2: I just saw that Steven replied while I was writing this email... I
will read them now... sorry for some repetitive topic here ]

-- Daniel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-29 13:49               ` Peter Zijlstra
@ 2019-05-29 13:58                 ` Steven Rostedt
  0 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2019-05-29 13:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Bristot de Oliveira, linux-kernel, williams, daniel,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On Wed, 29 May 2019 15:49:46 +0200
Peter Zijlstra <peterz@infradead.org> wrote:


> > That's basically what I was suggesting as the solution to this ;-)  
> 
> You were wanting changes to preempt_disable() and task_struct, neither
> of which is required. The above only needs some per-cpu storage in the
> tracer implementation.

Only changes were to the trace preempt_disable() code. Which I still
think needs to be done regardless to differentiate between when we are
tracing preempt disable and when we are not.

And no modification would need to be done to task_struct as we already
have a place to add tracing flags.

-- Steve




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-29 13:51         ` Daniel Bristot de Oliveira
@ 2019-05-29 18:21           ` Peter Zijlstra
  2019-06-04 10:20             ` Daniel Bristot de Oliveira
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2019-05-29 18:21 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware),
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On Wed, May 29, 2019 at 03:51:31PM +0200, Daniel Bristot de Oliveira wrote:
> On 29/05/2019 12:20, Peter Zijlstra wrote:

> > I'm not sure I follow, IRQs disabled fully implies !preemptible. I don't
> > see how the model would be more pessimistic than reality if it were to
> > use this knowledge.
> 
> Maybe I did not expressed myself well... and the example was not good either.
> 
> "IRQs disabled fully implies !preemptible" is a "to big" step. In modeling (or
> mathematical reasoning?), a good practice is to break the properties into small
> piece, and then build more complex reasoning/implications using these "small
> properties."
> 
> Doing "big steps" makes you prone "miss interpretations", creating ambiguity.
> Then, -RT people are prone to be pessimist, non-RT optimistic, and so on... and
> that is what models try to avoid.

You already construct the big model out of small generators, this is
just one more of those little generators.

> For instance, explaining this using words is contradictory:>
> > Any !0 preempt_count(), which very much includes (Hard)IRQ and SoftIRQ
> > counts, means non-preemptible.
> 
> One might argue that, the preemption of a thread always takes place with
> preempt_count() != 0, because __schedule() is always called with preemption
> disabled, so the preemption takes place while in non-preemptive.

Yeah, I know about that one; you've used it in your talks. Also, you've
modeled the schedule preempt disable as a special state. If you want we
can actually make it a special bit in the preempt_count word too, the
patch shouldn't be too hard, although it would make no practical
difference.

> - WAIT But you (daniel) wants to fake the atomicity between preempt_disable and
> its tracepoint!
> 
> Yes, I do, but this is a very straightforward step/assumption: the atomicity is
> about the real-event and the tracepoint that notifies it. It is not about two
> different events.
> 
> That is why it is worth letting the modeling rules to clarify the behavior of
> system, without doing non-obvious implication in the code part, so we can have a
> model that fits better in the Linux actions/events to avoid ambiguity.

You can easily build a little shim betwen the model and the tracehooks
that fix this up if you don't want to stick it in the model proper.

All the information is there. Heck you can even do that 3/3 thing
internally I think.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-29  9:40     ` Daniel Bristot de Oliveira
  2019-05-29 10:20       ` Peter Zijlstra
@ 2019-05-31  7:47       ` Joel Fernandes
  2019-06-04 10:12         ` Daniel Bristot de Oliveira
  1 sibling, 1 reply; 27+ messages in thread
From: Joel Fernandes @ 2019-05-31  7:47 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: Peter Zijlstra, linux-kernel, williams, daniel,
	Steven Rostedt (VMware),
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li,
	Tommaso Cucinotta

On Wed, May 29, 2019 at 11:40:34AM +0200, Daniel Bristot de Oliveira wrote:
> On 29/05/2019 10:33, Peter Zijlstra wrote:
> > On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote:
> >> The preempt_disable/enable tracepoint only traces in the disable <-> enable
> >> case, which is correct. But think about this case:
> >>
> >> ---------------------------- %< ------------------------------
> >> 	THREAD					IRQ
> >> 	   |					 |
> >> preempt_disable() {
> >>     __preempt_count_add(1)
> >> 	------->	    smp_apic_timer_interrupt() {
> >> 				preempt_disable()
> >> 				    do not trace (preempt count >= 1)
> >> 				    ....
> >> 				preempt_enable()
> >> 				    do not trace (preempt count >= 1)
> >> 			    }
> >>     trace_preempt_disable();
> >> }
> >> ---------------------------- >% ------------------------------
> >>
> >> The tracepoint will be skipped.
> > 
> > .... for the IRQ. But IRQs are not preemptible anyway, so what the
> > problem?
> 
> 
> right, they are.
> 
> exposing my problem in a more specific way:
> 
> To show in a model that an event always takes place with preemption disabled,
> but not necessarily with IRQs disabled, it is worth having the preemption
> disable events separated from IRQ disable ones.
> 
> The main reason is that, although IRQs disabled postpone the execution of the
> scheduler, it is more pessimistic, as it also delays IRQs. So the more precise
> the model is, the less pessimistic the analysis will be.
> 
> But there are other use-cases, for instance:
> 
> (Steve, correct me if I am wrong)
> 
> The preempt_tracer will not notice a "preempt disabled" section in an IRQ
> handler if the problem above happens.
> 
> (Yeah, I know these problems are very specific... but...)

I agree with the problem. I think Daniel does not want to miss the preemption
disabled event caused by the IRQ disabling.

> >> To avoid skipping the trace, the change in the counter should be "atomic"
> >> with the start/stop, w.r.t the interrupts.
> >>
> >> Disable interrupts while the adding/starting stopping/subtracting.
> > 
> >> +static inline void preempt_add_start_latency(int val)
> >> +{
> >> +	unsigned long flags;
> >> +
> >> +	raw_local_irq_save(flags);
> >> +	__preempt_count_add(val);
> >> +	preempt_latency_start(val);
> >> +	raw_local_irq_restore(flags);
> >> +}
> > 
> >> +static inline void preempt_sub_stop_latency(int val)
> >> +{
> >> +	unsigned long flags;
> >> +
> >> +	raw_local_irq_save(flags);
> >> +	preempt_latency_stop(val);
> >> +	__preempt_count_sub(val);
> >> +	raw_local_irq_restore(flags);
> >> +}
> > 
> > That is hideously expensive :/
> 
> Yeah... :-( Is there another way to provide such "atomicity"?
> 
> Can I use the argument "if one has these tracepoints enabled, they are not
> considering it as a hot-path?"

The only addition here seems to  the raw_local_irq_{save,restore} around the
calls to increment the preempt counter and start the latency tracking.

Is there any performance data with the tracepoint enabled and with/without
this patch? Like with hackbench?

Thanks.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-31  7:47       ` Joel Fernandes
@ 2019-06-04 10:12         ` Daniel Bristot de Oliveira
  2019-06-04 15:01           ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-06-04 10:12 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Peter Zijlstra, linux-kernel, williams, daniel,
	Steven Rostedt (VMware),
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li,
	Tommaso Cucinotta

On 31/05/2019 09:47, Joel Fernandes wrote:
> On Wed, May 29, 2019 at 11:40:34AM +0200, Daniel Bristot de Oliveira wrote:
>> On 29/05/2019 10:33, Peter Zijlstra wrote:
>>> On Tue, May 28, 2019 at 05:16:23PM +0200, Daniel Bristot de Oliveira wrote:
>>>> The preempt_disable/enable tracepoint only traces in the disable <-> enable
>>>> case, which is correct. But think about this case:
>>>>
>>>> ---------------------------- %< ------------------------------
>>>> 	THREAD					IRQ
>>>> 	   |					 |
>>>> preempt_disable() {
>>>>     __preempt_count_add(1)
>>>> 	------->	    smp_apic_timer_interrupt() {
>>>> 				preempt_disable()
>>>> 				    do not trace (preempt count >= 1)
>>>> 				    ....
>>>> 				preempt_enable()
>>>> 				    do not trace (preempt count >= 1)
>>>> 			    }
>>>>     trace_preempt_disable();
>>>> }
>>>> ---------------------------- >% ------------------------------
>>>>
>>>> The tracepoint will be skipped.
>>>
>>> .... for the IRQ. But IRQs are not preemptible anyway, so what the
>>> problem?
>>
>>
>> right, they are.
>>
>> exposing my problem in a more specific way:
>>
>> To show in a model that an event always takes place with preemption disabled,
>> but not necessarily with IRQs disabled, it is worth having the preemption
>> disable events separated from IRQ disable ones.
>>
>> The main reason is that, although IRQs disabled postpone the execution of the
>> scheduler, it is more pessimistic, as it also delays IRQs. So the more precise
>> the model is, the less pessimistic the analysis will be.
>>
>> But there are other use-cases, for instance:
>>
>> (Steve, correct me if I am wrong)
>>
>> The preempt_tracer will not notice a "preempt disabled" section in an IRQ
>> handler if the problem above happens.
>>
>> (Yeah, I know these problems are very specific... but...)
> 
> I agree with the problem. I think Daniel does not want to miss the preemption
> disabled event caused by the IRQ disabling.

Hi Joel!

Correct, but ... look bellow.

>>>> To avoid skipping the trace, the change in the counter should be "atomic"
>>>> with the start/stop, w.r.t the interrupts.
>>>>
>>>> Disable interrupts while the adding/starting stopping/subtracting.
>>>
>>>> +static inline void preempt_add_start_latency(int val)
>>>> +{
>>>> +	unsigned long flags;
>>>> +
>>>> +	raw_local_irq_save(flags);
>>>> +	__preempt_count_add(val);
>>>> +	preempt_latency_start(val);
>>>> +	raw_local_irq_restore(flags);
>>>> +}
>>>
>>>> +static inline void preempt_sub_stop_latency(int val)
>>>> +{
>>>> +	unsigned long flags;
>>>> +
>>>> +	raw_local_irq_save(flags);
>>>> +	preempt_latency_stop(val);
>>>> +	__preempt_count_sub(val);
>>>> +	raw_local_irq_restore(flags);
>>>> +}
>>>
>>> That is hideously expensive :/
>>
>> Yeah... :-( Is there another way to provide such "atomicity"?
>>
>> Can I use the argument "if one has these tracepoints enabled, they are not
>> considering it as a hot-path?"
> 
> The only addition here seems to  the raw_local_irq_{save,restore} around the
> calls to increment the preempt counter and start the latency tracking.
> 
> Is there any performance data with the tracepoint enabled and with/without
> this patch? Like with hackbench?

I discussed this with Steve at the Summit on the Summit (the reason why I did
not reply this email earlier is because I was in the conf/traveling), and he
also agrees with peterz, disabling and (mainly) re-enabling IRQs costs too much.

We need to find another way to resolve this problem (or mitigate the cost).... :-(.

Ideas?

Thanks!!

-- Daniel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-05-29 18:21           ` Peter Zijlstra
@ 2019-06-04 10:20             ` Daniel Bristot de Oliveira
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-06-04 10:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, williams, daniel, Steven Rostedt (VMware),
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Joel Fernandes (Google),
	Frederic Weisbecker, Yangtao Li, Tommaso Cucinotta

On 29/05/2019 20:21, Peter Zijlstra wrote:
> On Wed, May 29, 2019 at 03:51:31PM +0200, Daniel Bristot de Oliveira wrote:
>> On 29/05/2019 12:20, Peter Zijlstra wrote:
> 
>>> I'm not sure I follow, IRQs disabled fully implies !preemptible. I don't
>>> see how the model would be more pessimistic than reality if it were to
>>> use this knowledge.
>>
>> Maybe I did not expressed myself well... and the example was not good either.
>>
>> "IRQs disabled fully implies !preemptible" is a "to big" step. In modeling (or
>> mathematical reasoning?), a good practice is to break the properties into small
>> piece, and then build more complex reasoning/implications using these "small
>> properties."
>>
>> Doing "big steps" makes you prone "miss interpretations", creating ambiguity.
>> Then, -RT people are prone to be pessimist, non-RT optimistic, and so on... and
>> that is what models try to avoid.
> 
> You already construct the big model out of small generators, this is
> just one more of those little generators.

Yes, we can take that way too...

>> For instance, explaining this using words is contradictory:>
>>> Any !0 preempt_count(), which very much includes (Hard)IRQ and SoftIRQ
>>> counts, means non-preemptible.
>>
>> One might argue that, the preemption of a thread always takes place with
>> preempt_count() != 0, because __schedule() is always called with preemption
>> disabled, so the preemption takes place while in non-preemptive.
> 
> Yeah, I know about that one; you've used it in your talks. Also, you've
> modeled the schedule preempt disable as a special state. If you want we
> can actually make it a special bit in the preempt_count word too, the
> patch shouldn't be too hard, although it would make no practical
> difference.

Good to know! I am trying not to change the code or abstractions. In the
user-space version I was using the caller address to figure out if it was a call
in the scheduler or not. In the kernel version I am planing to use a
is_sched_function() like approach. But if nothing of that works, I will explore
this possibility, thanks for the suggestion.

>> - WAIT But you (daniel) wants to fake the atomicity between preempt_disable and
>> its tracepoint!
>>
>> Yes, I do, but this is a very straightforward step/assumption: the atomicity is
>> about the real-event and the tracepoint that notifies it. It is not about two
>> different events.
>>
>> That is why it is worth letting the modeling rules to clarify the behavior of
>> system, without doing non-obvious implication in the code part, so we can have a
>> model that fits better in the Linux actions/events to avoid ambiguity.
> 
> You can easily build a little shim betwen the model and the tracehooks
> that fix this up if you don't want to stick it in the model proper.
> 
> All the information is there. Heck you can even do that 3/3 thing
> internally I think.
> 

Ack, let's see what I can came up.

Thanks!

-- Daniel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption
  2019-05-29 12:22     ` Steven Rostedt
@ 2019-06-04 10:39       ` Daniel Bristot de Oliveira
  2019-06-04 12:01         ` Steven Rostedt
  0 siblings, 1 reply; 27+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-06-04 10:39 UTC (permalink / raw)
  To: Steven Rostedt, Joel Fernandes
  Cc: Daniel Bristot de Oliveira, linux-kernel, williams, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li,
	Tommaso Cucinotta



On 29/05/2019 14:22, Steven Rostedt wrote:
> On Wed, 29 May 2019 05:30:56 -0400
> Joel Fernandes <joel@joelfernandes.org> wrote:
> 
>> Yes, I think so. Also this patch changes CALLER_ADDR0 passed to the
>> tracepoint because there's one more level of a non-inlined function call
>> in the call chain right?  Very least the changelog should document this
>> change in functional behavior, IMO.

In practice I am seeing no change in the values printed, but there is another
problem with this regard: there are cases in which both caller and parent have
the same address.

I am quite sure it has to do with the in_lock_function() behavior. Anyway, I was
already planing to propose a cleanup in the in_lock_function/in_sched_function.
I will investigate it more.

> This sounds more like a break in behavior not a functional change. I
> guess moving it to a header and making it a static __always_inline
> should be fine though.

Steve, which header should I use?

Thanks!

-- Daniel

> -- Steve
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption
  2019-06-04 10:39       ` Daniel Bristot de Oliveira
@ 2019-06-04 12:01         ` Steven Rostedt
  0 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2019-06-04 12:01 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: Joel Fernandes, linux-kernel, williams, Ingo Molnar,
	Peter Zijlstra, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li,
	Tommaso Cucinotta

On Tue, 4 Jun 2019 12:39:03 +0200
Daniel Bristot de Oliveira <bristot@redhat.com> wrote:

> > This sounds more like a break in behavior not a functional change. I
> > guess moving it to a header and making it a static __always_inline
> > should be fine though.  
> 
> Steve, which header should I use?

I would say include/linux/preempt.h if Peter doesn't have any issues
with that.

-- Steve

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-06-04 10:12         ` Daniel Bristot de Oliveira
@ 2019-06-04 15:01           ` Steven Rostedt
  2019-06-05 15:16             ` Joel Fernandes
  0 siblings, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2019-06-04 15:01 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: Joel Fernandes, Peter Zijlstra, linux-kernel, williams, daniel,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li,
	Tommaso Cucinotta

On Tue, 4 Jun 2019 12:12:36 +0200
Daniel Bristot de Oliveira <bristot@redhat.com> wrote:

> I discussed this with Steve at the Summit on the Summit (the reason why I did
> not reply this email earlier is because I was in the conf/traveling), and he
> also agrees with peterz, disabling and (mainly) re-enabling IRQs costs too much.
> 
> We need to find another way to resolve this problem (or mitigate the cost).... :-(.
> 
> Ideas?

I thought we talked about using flags in the pc to let us know that we
are about to trace the preemption off?


If an interrupt comes in, we check the flag:

	irq_enter()
		preempt_count_add(HARDIRQ_OFFSET)


Then we can have something like:

preempt_count_add(val) {
	int pc = preempt_count();
	int trace_val = TRACE_FLAG;

	if (val == HARDIRQ_OFFSET && (pc & TRACE_FLAG)) {
		__preempt_count_sub(TRACE_FLAG);
		trace_val |= TRACE_SET;
	}

	__preempt_count_add(val + trace_val);
	if (!pc)
		trace_preempt_disable();
	__preempt_count_sub(trace_val);


And in trace_preempt_enable() we have:

	if ((preempt_count() & TRACE_SET) && in_irq())
		return;

Thus, we wont call the preempt_enable tracing code if we started it due
to an interrupt preempting the process of setting the trace.

Note, I'm writing this while extremely tired (still have yet to get
more than 4 hours of sleep) so it may still have bugs, but you should
get the idea ;-)

-- Steve




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change
  2019-06-04 15:01           ` Steven Rostedt
@ 2019-06-05 15:16             ` Joel Fernandes
  0 siblings, 0 replies; 27+ messages in thread
From: Joel Fernandes @ 2019-06-05 15:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Daniel Bristot de Oliveira, Peter Zijlstra, linux-kernel,
	williams, daniel, Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	Matthias Kaehlcke, Frederic Weisbecker, Yangtao Li,
	Tommaso Cucinotta

On Tue, Jun 04, 2019 at 11:01:27AM -0400, Steven Rostedt wrote:
> On Tue, 4 Jun 2019 12:12:36 +0200
> Daniel Bristot de Oliveira <bristot@redhat.com> wrote:
> 
> > I discussed this with Steve at the Summit on the Summit (the reason why I did
> > not reply this email earlier is because I was in the conf/traveling), and he
> > also agrees with peterz, disabling and (mainly) re-enabling IRQs costs too much.
> > 
> > We need to find another way to resolve this problem (or mitigate the cost).... :-(.
> > 
> > Ideas?
> 
> I thought we talked about using flags in the pc to let us know that we
> are about to trace the preemption off?
> 
> 
> If an interrupt comes in, we check the flag:
> 
> 	irq_enter()
> 		preempt_count_add(HARDIRQ_OFFSET)
> 
> 
> Then we can have something like:
> 
> preempt_count_add(val) {
> 	int pc = preempt_count();
> 	int trace_val = TRACE_FLAG;
> 
> 	if (val == HARDIRQ_OFFSET && (pc & TRACE_FLAG)) {
> 		__preempt_count_sub(TRACE_FLAG);
> 		trace_val |= TRACE_SET;
> 	}
> 
> 	__preempt_count_add(val + trace_val);
> 	if (!pc)
> 		trace_preempt_disable();
> 	__preempt_count_sub(trace_val);
> 
> 
> And in trace_preempt_enable() we have:
> 
> 	if ((preempt_count() & TRACE_SET) && in_irq())
> 		return;
> 
> Thus, we wont call the preempt_enable tracing code if we started it due
> to an interrupt preempting the process of setting the trace.

Hmm, the interrupt handler will not be calling preempt_enable anyway since
the preempt counter was already set by the interrupted code. The situation
Daniel describes in patch 2/3 is:

---------------------------- %< ------------------------------
	THREAD					IRQ
	   |					 |
preempt_disable() {
    __preempt_count_add(1)
	------->	    smp_apic_timer_interrupt() {
				preempt_disable()
				    do not trace (preempt count >= 1)
				    ....
				preempt_enable()
				    do not trace (preempt count >= 1)
			    }
    trace_preempt_disable();
}
---------------------------- >% ------------------------------

Here the preempt_enable in the IRQ will not call tracing. If I understand
correctly, he *does* want to call tracing since the IRQ's preempt
disable/enable section could be running for some time, and so
it would not be nice to miss these events from the traces.

Regarding the preempt count flag idea, I really like your idea. I did not
fully follow the code snip you shared above though. My understanding of your
idea initially was, we set a flag in pc, then increment the preempt count,
call tracing and the reset the pc flag. In between the increment and the call
to the tracer, if we get interrupted, then we check the flag from IRQ
context, and still call tracing. If we get interrupted after calling tracing,
but before resetting the flag, then we may end up with a nested pair of
preempt enable/disable calls, but that's Ok. At least we wont miss any
events.

At least I think we only need to modify the preempt disable path, something
like this in pseudo code in the preempt disable path (of the THREAD context):

set   pc flag
inc   pc
		<--- if interrupted here, still trace from IRQ.
call  tracer
		<--- if interrupted here, still trace from IRQ.
reset pc flag
		<--- if interrupted here, no need to trace from IRQ since
		     we already recorded at least one preempt disable event.


Did I get your idea right?

I need to think about this some more, but got to run to an appointment. Will
take a look once I'm back.

> Note, I'm writing this while extremely tired (still have yet to get
> more than 4 hours of sleep) so it may still have bugs, but you should
> get the idea ;-)

No problem ;-)

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-06-05 15:16 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-28 15:16 [RFC 0/3] preempt_tracer: Fix preempt_disable tracepoint Daniel Bristot de Oliveira
2019-05-28 15:16 ` [RFC 1/3] softirq: Use preempt_latency_stop/start to trace preemption Daniel Bristot de Oliveira
2019-05-29  8:29   ` Peter Zijlstra
2019-05-29  9:30   ` Joel Fernandes
2019-05-29 12:22     ` Steven Rostedt
2019-06-04 10:39       ` Daniel Bristot de Oliveira
2019-06-04 12:01         ` Steven Rostedt
2019-05-28 15:16 ` [RFC 2/3] preempt_tracer: Disable IRQ while starting/stopping due to a preempt_counter change Daniel Bristot de Oliveira
2019-05-29  8:33   ` Peter Zijlstra
2019-05-29  9:40     ` Daniel Bristot de Oliveira
2019-05-29 10:20       ` Peter Zijlstra
2019-05-29 12:39         ` Steven Rostedt
2019-05-29 13:19           ` Peter Zijlstra
2019-05-29 13:29             ` Peter Zijlstra
2019-05-29 13:42             ` Steven Rostedt
2019-05-29 13:49               ` Peter Zijlstra
2019-05-29 13:58                 ` Steven Rostedt
2019-05-29 13:51         ` Daniel Bristot de Oliveira
2019-05-29 18:21           ` Peter Zijlstra
2019-06-04 10:20             ` Daniel Bristot de Oliveira
2019-05-31  7:47       ` Joel Fernandes
2019-06-04 10:12         ` Daniel Bristot de Oliveira
2019-06-04 15:01           ` Steven Rostedt
2019-06-05 15:16             ` Joel Fernandes
2019-05-28 15:16 ` [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls Daniel Bristot de Oliveira
2019-05-29  8:41   ` Peter Zijlstra
2019-05-29  9:48     ` Daniel Bristot de Oliveira

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).