LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* kernel/timer: avoid spurious ksoftirqd wakeups
@ 2015-04-02  1:44 Marcelo Tosatti
  2015-04-02 13:58 ` Rik van Riel
  2015-04-02 14:59 ` Frederic Weisbecker
  0 siblings, 2 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2015-04-02  1:44 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Rik van Riel, Thomas Gleixner, Paul E. McKenney,
	Viresh Kumar


It is only necessary to raise timer softirq
in case there are active timers or irq work 
to do.

Limit the ksoftirqd wakeup to those cases.

Fixes a latency spike with isolated CPUs and 
nohz full mode.

Reported-and-tested-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197..0c065f9 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -192,7 +192,7 @@ extern void set_timer_slack(struct timer_list *time, int slack_hz);
  * locks the timer base and does the comparison against the given
  * jiffie.
  */
-extern unsigned long get_next_timer_interrupt(unsigned long now);
+extern unsigned long get_next_timer_interrupt(unsigned long now, bool *raise_softirq);
 
 /*
  * Timer-statistics info:
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a4c4eda..615e276 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	unsigned long rcu_delta_jiffies;
 	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
 	u64 time_delta;
+	bool raise_softirq;
 
 	time_delta = timekeeping_max_deferment();
 
@@ -582,9 +583,11 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 	    arch_needs_cpu() || irq_work_needs_cpu()) {
 		next_jiffies = last_jiffies + 1;
 		delta_jiffies = 1;
+		raise_softirq = true;
 	} else {
 		/* Get the next timer wheel timer */
-		next_jiffies = get_next_timer_interrupt(last_jiffies);
+		next_jiffies = get_next_timer_interrupt(last_jiffies,
+							&raise_softirq);
 		delta_jiffies = next_jiffies - last_jiffies;
 		if (rcu_delta_jiffies < delta_jiffies) {
 			next_jiffies = last_jiffies + rcu_delta_jiffies;
@@ -703,7 +706,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 		 */
 		tick_do_update_jiffies64(ktime_get());
 	}
-	raise_softirq_irqoff(TIMER_SOFTIRQ);
+	if (raise_softirq)
+		raise_softirq_irqoff(TIMER_SOFTIRQ);
 out:
 	ts->next_jiffies = next_jiffies;
 	ts->last_jiffies = last_jiffies;
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 2d3f5c5..771f811 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1343,7 +1343,7 @@ static unsigned long cmp_next_hrtimer_event(unsigned long now,
  * get_next_timer_interrupt - return the jiffy of the next pending timer
  * @now: current time (in jiffies)
  */
-unsigned long get_next_timer_interrupt(unsigned long now)
+unsigned long get_next_timer_interrupt(unsigned long now, bool *raise_softirq)
 {
 	struct tvec_base *base = __this_cpu_read(tvec_bases);
 	unsigned long expires = now + NEXT_TIMER_MAX_DELTA;
@@ -1357,6 +1357,7 @@ unsigned long get_next_timer_interrupt(unsigned long now)
 
 	spin_lock(&base->lock);
 	if (base->active_timers) {
+		*raise_softirq = true;
 		if (time_before_eq(base->next_timer, base->timer_jiffies))
 			base->next_timer = __next_timer_interrupt(base);
 		expires = base->next_timer;



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel/timer: avoid spurious ksoftirqd wakeups
  2015-04-02  1:44 kernel/timer: avoid spurious ksoftirqd wakeups Marcelo Tosatti
@ 2015-04-02 13:58 ` Rik van Riel
  2015-04-02 20:53   ` Marcelo Tosatti
  2015-04-02 14:59 ` Frederic Weisbecker
  1 sibling, 1 reply; 11+ messages in thread
From: Rik van Riel @ 2015-04-02 13:58 UTC (permalink / raw)
  To: Marcelo Tosatti, Frederic Weisbecker
  Cc: linux-kernel, Thomas Gleixner, Paul E. McKenney, Viresh Kumar

On 04/01/2015 09:44 PM, Marcelo Tosatti wrote:

> +++ b/kernel/time/tick-sched.c
> @@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>  	unsigned long rcu_delta_jiffies;
>  	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
>  	u64 time_delta;
> +	bool raise_softirq;

You may want to initialize this to false. Nothing else
in the code ever seems to set it to false.

It may work in your test due to that address on the stack
already being zeroed out due to a lucky coincidence, but
that is not a guarantee.

> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1343,7 +1343,7 @@ static unsigned long cmp_next_hrtimer_event(unsigned long now,
>   * get_next_timer_interrupt - return the jiffy of the next pending timer
>   * @now: current time (in jiffies)
>   */
> -unsigned long get_next_timer_interrupt(unsigned long now)
> +unsigned long get_next_timer_interrupt(unsigned long now, bool *raise_softirq)
>  {
>  	struct tvec_base *base = __this_cpu_read(tvec_bases);
>  	unsigned long expires = now + NEXT_TIMER_MAX_DELTA;
> @@ -1357,6 +1357,7 @@ unsigned long get_next_timer_interrupt(unsigned long now)
>  
>  	spin_lock(&base->lock);
>  	if (base->active_timers) {
> +		*raise_softirq = true;
>  		if (time_before_eq(base->next_timer, base->timer_jiffies))
>  			base->next_timer = __next_timer_interrupt(base);
>  		expires = base->next_timer;

Given that run_timer_softirq() only actually does something
if the timer has expired, would it make sense to only raise
the softirq after the timer has expired?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel/timer: avoid spurious ksoftirqd wakeups
  2015-04-02  1:44 kernel/timer: avoid spurious ksoftirqd wakeups Marcelo Tosatti
  2015-04-02 13:58 ` Rik van Riel
@ 2015-04-02 14:59 ` Frederic Weisbecker
  2015-04-02 21:08   ` Marcelo Tosatti
  1 sibling, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2015-04-02 14:59 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, Rik van Riel, Thomas Gleixner, Paul E. McKenney,
	Viresh Kumar

On Wed, Apr 01, 2015 at 10:44:55PM -0300, Marcelo Tosatti wrote:
> 
> It is only necessary to raise timer softirq
> in case there are active timers or irq work 
> to do.
> 
> Limit the ksoftirqd wakeup to those cases.
> 
> Fixes a latency spike with isolated CPUs and 
> nohz full mode.
> 
> Reported-and-tested-by: Luiz Capitulino <lcapitulino@redhat.com>
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 8c5a197..0c065f9 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -192,7 +192,7 @@ extern void set_timer_slack(struct timer_list *time, int slack_hz);
>   * locks the timer base and does the comparison against the given
>   * jiffie.
>   */
> -extern unsigned long get_next_timer_interrupt(unsigned long now);
> +extern unsigned long get_next_timer_interrupt(unsigned long now, bool *raise_softirq);
>  
>  /*
>   * Timer-statistics info:
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index a4c4eda..615e276 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>  	unsigned long rcu_delta_jiffies;
>  	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
>  	u64 time_delta;
> +	bool raise_softirq;
>  
>  	time_delta = timekeeping_max_deferment();
>  
> @@ -582,9 +583,11 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>  	    arch_needs_cpu() || irq_work_needs_cpu()) {
>  		next_jiffies = last_jiffies + 1;
>  		delta_jiffies = 1;
> +		raise_softirq = true;

I believe that irq_work doesn't need the softirq. It needs a tick only in order to call
irq_work_tick(). And I believe this is the same for RCU which needs a call to
rcu_check_callbacks(), but it might need something else that the softirq does
(but this is the timer softirq, not the rcu one). 

>  	} else {
>  		/* Get the next timer wheel timer */
> -		next_jiffies = get_next_timer_interrupt(last_jiffies);
> +		next_jiffies = get_next_timer_interrupt(last_jiffies,
> +							&raise_softirq);
>  		delta_jiffies = next_jiffies - last_jiffies;
>  		if (rcu_delta_jiffies < delta_jiffies) {
>  			next_jiffies = last_jiffies + rcu_delta_jiffies;
> @@ -703,7 +706,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>  		 */
>  		tick_do_update_jiffies64(ktime_get());
>  	}
> -	raise_softirq_irqoff(TIMER_SOFTIRQ);
> +	if (raise_softirq)
> +		raise_softirq_irqoff(TIMER_SOFTIRQ);
>  out:
>  	ts->next_jiffies = next_jiffies;
>  	ts->last_jiffies = last_jiffies;

Lets look at the things outside the pending timer list that can end up failing
to program the timer because it is in the past already:

_ timekeeping_max_deferment(): I doubt, the value is pretty high
_ scheduler_tick_max_deferment(); it's one second long, way enough to never be in
  the past by the time we program the clock
_ RCU, irq_work, arch: may be, if the last jiffies update is far enough. But apparently
  the problem is elsewhere since you keep the softirq for these and your patch solves your
  problem.
_ In case hrtimer runs in low-res mode and the next hrtimer is very close, or even in the past
  already, you may run into such issue. And hrtimer doesn't need the timer softirq, at least not
  to run the callbacks. It needs it only if hrtimer is switching to high-res mode, I think it's
  a rare event.

Now it would be nice to identify the issue we are facing here. Are you running in hrtimer low-res
mode?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel/timer: avoid spurious ksoftirqd wakeups
  2015-04-02 13:58 ` Rik van Riel
@ 2015-04-02 20:53   ` Marcelo Tosatti
  0 siblings, 0 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2015-04-02 20:53 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Frederic Weisbecker, linux-kernel, Thomas Gleixner,
	Paul E. McKenney, Viresh Kumar

On Thu, Apr 02, 2015 at 09:58:26AM -0400, Rik van Riel wrote:
> On 04/01/2015 09:44 PM, Marcelo Tosatti wrote:
> 
> > +++ b/kernel/time/tick-sched.c
> > @@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> >  	unsigned long rcu_delta_jiffies;
> >  	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
> >  	u64 time_delta;
> > +	bool raise_softirq;
> 
> You may want to initialize this to false. Nothing else
> in the code ever seems to set it to false.
> 
> It may work in your test due to that address on the stack
> already being zeroed out due to a lucky coincidence, but
> that is not a guarantee.
> 
> > --- a/kernel/time/timer.c
> > +++ b/kernel/time/timer.c
> > @@ -1343,7 +1343,7 @@ static unsigned long cmp_next_hrtimer_event(unsigned long now,
> >   * get_next_timer_interrupt - return the jiffy of the next pending timer
> >   * @now: current time (in jiffies)
> >   */
> > -unsigned long get_next_timer_interrupt(unsigned long now)
> > +unsigned long get_next_timer_interrupt(unsigned long now, bool *raise_softirq)
> >  {
> >  	struct tvec_base *base = __this_cpu_read(tvec_bases);
> >  	unsigned long expires = now + NEXT_TIMER_MAX_DELTA;
> > @@ -1357,6 +1357,7 @@ unsigned long get_next_timer_interrupt(unsigned long now)
> >  
> >  	spin_lock(&base->lock);
> >  	if (base->active_timers) {
> > +		*raise_softirq = true;
> >  		if (time_before_eq(base->next_timer, base->timer_jiffies))
> >  			base->next_timer = __next_timer_interrupt(base);
> >  		expires = base->next_timer;
> 
> Given that run_timer_softirq() only actually does something
> if the timer has expired, would it make sense to only raise
> the softirq after the timer has expired?

jiffies might be increased by tick_nohz_stop_sched_tick.

So you'd have to test again after increasing jiffies.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel/timer: avoid spurious ksoftirqd wakeups
  2015-04-02 14:59 ` Frederic Weisbecker
@ 2015-04-02 21:08   ` Marcelo Tosatti
  2015-04-06 23:34     ` Frederic Weisbecker
  0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2015-04-02 21:08 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Rik van Riel, Thomas Gleixner, Paul E. McKenney,
	Viresh Kumar

On Thu, Apr 02, 2015 at 04:59:40PM +0200, Frederic Weisbecker wrote:
> On Wed, Apr 01, 2015 at 10:44:55PM -0300, Marcelo Tosatti wrote:
> > 
> > It is only necessary to raise timer softirq
> > in case there are active timers or irq work 
> > to do.
> > 
> > Limit the ksoftirqd wakeup to those cases.
> > 
> > Fixes a latency spike with isolated CPUs and 
> > nohz full mode.
> > 
> > Reported-and-tested-by: Luiz Capitulino <lcapitulino@redhat.com>
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > diff --git a/include/linux/timer.h b/include/linux/timer.h
> > index 8c5a197..0c065f9 100644
> > --- a/include/linux/timer.h
> > +++ b/include/linux/timer.h
> > @@ -192,7 +192,7 @@ extern void set_timer_slack(struct timer_list *time, int slack_hz);
> >   * locks the timer base and does the comparison against the given
> >   * jiffie.
> >   */
> > -extern unsigned long get_next_timer_interrupt(unsigned long now);
> > +extern unsigned long get_next_timer_interrupt(unsigned long now, bool *raise_softirq);
> >  
> >  /*
> >   * Timer-statistics info:
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index a4c4eda..615e276 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> >  	unsigned long rcu_delta_jiffies;
> >  	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
> >  	u64 time_delta;
> > +	bool raise_softirq;
> >  
> >  	time_delta = timekeeping_max_deferment();
> >  
> > @@ -582,9 +583,11 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> >  	    arch_needs_cpu() || irq_work_needs_cpu()) {
> >  		next_jiffies = last_jiffies + 1;
> >  		delta_jiffies = 1;
> > +		raise_softirq = true;
> 
> I believe that irq_work doesn't need the softirq. 

Can drop that, right.

> It needs a tick only in order to call
> irq_work_tick(). And I believe this is the same for RCU which needs a call to
> rcu_check_callbacks(), but it might need something else that the softirq does
> (but this is the timer softirq, not the rcu one). 
> 
> >  	} else {
> >  		/* Get the next timer wheel timer */
> > -		next_jiffies = get_next_timer_interrupt(last_jiffies);
> > +		next_jiffies = get_next_timer_interrupt(last_jiffies,
> > +							&raise_softirq);
> >  		delta_jiffies = next_jiffies - last_jiffies;
> >  		if (rcu_delta_jiffies < delta_jiffies) {
> >  			next_jiffies = last_jiffies + rcu_delta_jiffies;
> > @@ -703,7 +706,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> >  		 */
> >  		tick_do_update_jiffies64(ktime_get());
> >  	}
> > -	raise_softirq_irqoff(TIMER_SOFTIRQ);
> > +	if (raise_softirq)
> > +		raise_softirq_irqoff(TIMER_SOFTIRQ);
> >  out:
> >  	ts->next_jiffies = next_jiffies;
> >  	ts->last_jiffies = last_jiffies;
> 
> Lets look at the things outside the pending timer list that can end up failing
> to program the timer because it is in the past already:

Is this an attempt to find possible regressions introduced 
by this change ?

> _ timekeeping_max_deferment(): I doubt, the value is pretty high
> _ scheduler_tick_max_deferment(); it's one second long, way enough to never be in
>   the past by the time we program the clock
> _ RCU, irq_work, arch: may be, if the last jiffies update is far enough. But apparently
>   the problem is elsewhere since you keep the softirq for these and your patch solves your
>   problem.
> _ In case hrtimer runs in low-res mode and the next hrtimer is very close, or even in the past
>   already, you may run into such issue. And hrtimer doesn't need the timer softirq, at least not
>   to run the callbacks. It needs it only if hrtimer is switching to high-res mode, I think it's
>   a rare event.
> 
> Now it would be nice to identify the issue we are facing here. Are you running in hrtimer low-res
> mode?

The issue is a latency spike due to ksoftirqd waking up to 
process pending timers, processing two deferred timers, 
but no non-deferred timers.

hrtimer is not in low-res mode.

The issue is ksoftirqd waking up in the first place.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel/timer: avoid spurious ksoftirqd wakeups
  2015-04-02 21:08   ` Marcelo Tosatti
@ 2015-04-06 23:34     ` Frederic Weisbecker
  2015-04-06 23:51       ` Marcelo Tosatti
  0 siblings, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2015-04-06 23:34 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, Rik van Riel, Thomas Gleixner, Paul E. McKenney,
	Viresh Kumar

On Thu, Apr 02, 2015 at 06:08:09PM -0300, Marcelo Tosatti wrote:
> On Thu, Apr 02, 2015 at 04:59:40PM +0200, Frederic Weisbecker wrote:
> > On Wed, Apr 01, 2015 at 10:44:55PM -0300, Marcelo Tosatti wrote:
> > > 
> > > It is only necessary to raise timer softirq
> > > in case there are active timers or irq work 
> > > to do.
> > > 
> > > Limit the ksoftirqd wakeup to those cases.
> > > 
> > > Fixes a latency spike with isolated CPUs and 
> > > nohz full mode.
> > > 
> > > Reported-and-tested-by: Luiz Capitulino <lcapitulino@redhat.com>
> > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > > 
> > > diff --git a/include/linux/timer.h b/include/linux/timer.h
> > > index 8c5a197..0c065f9 100644
> > > --- a/include/linux/timer.h
> > > +++ b/include/linux/timer.h
> > > @@ -192,7 +192,7 @@ extern void set_timer_slack(struct timer_list *time, int slack_hz);
> > >   * locks the timer base and does the comparison against the given
> > >   * jiffie.
> > >   */
> > > -extern unsigned long get_next_timer_interrupt(unsigned long now);
> > > +extern unsigned long get_next_timer_interrupt(unsigned long now, bool *raise_softirq);
> > >  
> > >  /*
> > >   * Timer-statistics info:
> > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > > index a4c4eda..615e276 100644
> > > --- a/kernel/time/tick-sched.c
> > > +++ b/kernel/time/tick-sched.c
> > > @@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > >  	unsigned long rcu_delta_jiffies;
> > >  	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
> > >  	u64 time_delta;
> > > +	bool raise_softirq;
> > >  
> > >  	time_delta = timekeeping_max_deferment();
> > >  
> > > @@ -582,9 +583,11 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > >  	    arch_needs_cpu() || irq_work_needs_cpu()) {
> > >  		next_jiffies = last_jiffies + 1;
> > >  		delta_jiffies = 1;
> > > +		raise_softirq = true;
> > 
> > I believe that irq_work doesn't need the softirq. 
> 
> Can drop that, right.
> 
> > It needs a tick only in order to call
> > irq_work_tick(). And I believe this is the same for RCU which needs a call to
> > rcu_check_callbacks(), but it might need something else that the softirq does
> > (but this is the timer softirq, not the rcu one). 
> > 
> > >  	} else {
> > >  		/* Get the next timer wheel timer */
> > > -		next_jiffies = get_next_timer_interrupt(last_jiffies);
> > > +		next_jiffies = get_next_timer_interrupt(last_jiffies,
> > > +							&raise_softirq);
> > >  		delta_jiffies = next_jiffies - last_jiffies;
> > >  		if (rcu_delta_jiffies < delta_jiffies) {
> > >  			next_jiffies = last_jiffies + rcu_delta_jiffies;
> > > @@ -703,7 +706,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > >  		 */
> > >  		tick_do_update_jiffies64(ktime_get());
> > >  	}
> > > -	raise_softirq_irqoff(TIMER_SOFTIRQ);
> > > +	if (raise_softirq)
> > > +		raise_softirq_irqoff(TIMER_SOFTIRQ);
> > >  out:
> > >  	ts->next_jiffies = next_jiffies;
> > >  	ts->last_jiffies = last_jiffies;
> > 
> > Lets look at the things outside the pending timer list that can end up failing
> > to program the timer because it is in the past already:
> 
> Is this an attempt to find possible regressions introduced 
> by this change ?

Yeah, it would be nice to make sure that the cause of these softirqs isn't
mistakenly ignored. And also I want to be sure we really understand what we
are doing, which is not the case right now as we don't know what is causing
this expired timer.

> 
> > _ timekeeping_max_deferment(): I doubt, the value is pretty high
> > _ scheduler_tick_max_deferment(); it's one second long, way enough to never be in
> >   the past by the time we program the clock
> > _ RCU, irq_work, arch: may be, if the last jiffies update is far enough. But apparently
> >   the problem is elsewhere since you keep the softirq for these and your patch solves your
> >   problem.
> > _ In case hrtimer runs in low-res mode and the next hrtimer is very close, or even in the past
> >   already, you may run into such issue. And hrtimer doesn't need the timer softirq, at least not
> >   to run the callbacks. It needs it only if hrtimer is switching to high-res mode, I think it's
> >   a rare event.
> > 
> > Now it would be nice to identify the issue we are facing here. Are you running in hrtimer low-res
> > mode?
> 
> The issue is a latency spike due to ksoftirqd waking up to 
> process pending timers, processing two deferred timers, 
> but no non-deferred timers.
> 
> hrtimer is not in low-res mode.
> 
> The issue is ksoftirqd waking up in the first place.

Sure, but why is it waking up exactly?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel/timer: avoid spurious ksoftirqd wakeups
  2015-04-06 23:34     ` Frederic Weisbecker
@ 2015-04-06 23:51       ` Marcelo Tosatti
  2015-04-07 20:17         ` Frederic Weisbecker
  0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2015-04-06 23:51 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Rik van Riel, Thomas Gleixner, Paul E. McKenney,
	Viresh Kumar

On Tue, Apr 07, 2015 at 01:34:15AM +0200, Frederic Weisbecker wrote:
> On Thu, Apr 02, 2015 at 06:08:09PM -0300, Marcelo Tosatti wrote:
> > On Thu, Apr 02, 2015 at 04:59:40PM +0200, Frederic Weisbecker wrote:
> > > On Wed, Apr 01, 2015 at 10:44:55PM -0300, Marcelo Tosatti wrote:
> > > > 
> > > > It is only necessary to raise timer softirq
> > > > in case there are active timers or irq work 
> > > > to do.
> > > > 
> > > > Limit the ksoftirqd wakeup to those cases.
> > > > 
> > > > Fixes a latency spike with isolated CPUs and 
> > > > nohz full mode.
> > > > 
> > > > Reported-and-tested-by: Luiz Capitulino <lcapitulino@redhat.com>
> > > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > > > 
> > > > diff --git a/include/linux/timer.h b/include/linux/timer.h
> > > > index 8c5a197..0c065f9 100644
> > > > --- a/include/linux/timer.h
> > > > +++ b/include/linux/timer.h
> > > > @@ -192,7 +192,7 @@ extern void set_timer_slack(struct timer_list *time, int slack_hz);
> > > >   * locks the timer base and does the comparison against the given
> > > >   * jiffie.
> > > >   */
> > > > -extern unsigned long get_next_timer_interrupt(unsigned long now);
> > > > +extern unsigned long get_next_timer_interrupt(unsigned long now, bool *raise_softirq);
> > > >  
> > > >  /*
> > > >   * Timer-statistics info:
> > > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > > > index a4c4eda..615e276 100644
> > > > --- a/kernel/time/tick-sched.c
> > > > +++ b/kernel/time/tick-sched.c
> > > > @@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > > >  	unsigned long rcu_delta_jiffies;
> > > >  	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
> > > >  	u64 time_delta;
> > > > +	bool raise_softirq;
> > > >  
> > > >  	time_delta = timekeeping_max_deferment();
> > > >  
> > > > @@ -582,9 +583,11 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > > >  	    arch_needs_cpu() || irq_work_needs_cpu()) {
> > > >  		next_jiffies = last_jiffies + 1;
> > > >  		delta_jiffies = 1;
> > > > +		raise_softirq = true;
> > > 
> > > I believe that irq_work doesn't need the softirq. 
> > 
> > Can drop that, right.
> > 
> > > It needs a tick only in order to call
> > > irq_work_tick(). And I believe this is the same for RCU which needs a call to
> > > rcu_check_callbacks(), but it might need something else that the softirq does
> > > (but this is the timer softirq, not the rcu one). 
> > > 
> > > >  	} else {
> > > >  		/* Get the next timer wheel timer */
> > > > -		next_jiffies = get_next_timer_interrupt(last_jiffies);
> > > > +		next_jiffies = get_next_timer_interrupt(last_jiffies,
> > > > +							&raise_softirq);
> > > >  		delta_jiffies = next_jiffies - last_jiffies;
> > > >  		if (rcu_delta_jiffies < delta_jiffies) {
> > > >  			next_jiffies = last_jiffies + rcu_delta_jiffies;
> > > > @@ -703,7 +706,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> > > >  		 */
> > > >  		tick_do_update_jiffies64(ktime_get());
> > > >  	}
> > > > -	raise_softirq_irqoff(TIMER_SOFTIRQ);
> > > > +	if (raise_softirq)
> > > > +		raise_softirq_irqoff(TIMER_SOFTIRQ);
> > > >  out:
> > > >  	ts->next_jiffies = next_jiffies;
> > > >  	ts->last_jiffies = last_jiffies;
> > > 
> > > Lets look at the things outside the pending timer list that can end up failing
> > > to program the timer because it is in the past already:
> > 
> > Is this an attempt to find possible regressions introduced 
> > by this change ?
> 
> Yeah, it would be nice to make sure that the cause of these softirqs isn't
> mistakenly ignored. 
> And also I want to be sure we really understand what we
> are doing, which is not the case right now as we don't know what is causing
> this expired timer.

What is the interrupt that is the cause for tick_nohz_stop_sched_tick,
you mean? 

       <...>-45815 [015] d...2.. 25722056692012 (+961446): kvm_exit: reason EXTERNAL_INTERRUPT rip 0x7f5e448479d0 info 0 800000ef
       <...>-45815 [015] d..h1.. 25722056692844 (+832): apic_timer_fn<-__run_hrtimer
       <...>-45815 [015] d...1.. 25722056695442 (+2598): raise_softirq_irqoff <-tick_nohz_stop_sched_tick

Emulation of guest APIC timer by hrtimer (apic_timer_fn).

> > > _ timekeeping_max_deferment(): I doubt, the value is pretty high
> > > _ scheduler_tick_max_deferment(); it's one second long, way enough to never be in
> > >   the past by the time we program the clock
> > > _ RCU, irq_work, arch: may be, if the last jiffies update is far enough. But apparently
> > >   the problem is elsewhere since you keep the softirq for these and your patch solves your
> > >   problem.
> > > _ In case hrtimer runs in low-res mode and the next hrtimer is very close, or even in the past
> > >   already, you may run into such issue. And hrtimer doesn't need the timer softirq, at least not
> > >   to run the callbacks. It needs it only if hrtimer is switching to high-res mode, I think it's
> > >   a rare event.
> > > 
> > > Now it would be nice to identify the issue we are facing here. Are you running in hrtimer low-res
> > > mode?
> > 
> > The issue is a latency spike due to ksoftirqd waking up to 
> > process pending timers, processing two deferred timers, 
> > but no non-deferred timers.
> > 
> > hrtimer is not in low-res mode.
> > 
> > The issue is ksoftirqd waking up in the first place.
> 
> Sure, but why is it waking up exactly?

Because there is a bug (the patch is trying to fix the bug by 
raising timer softirq only when timer softirq handler has any 
work to do).

The only timers pending in the timer list are deferred ones
from vmstat_update:

ksoftirqd/15-265   [015] ....111 25722056709372 (+7098): softirq_entry: vec=1 [action=TIMER]
ksoftirqd/15-265   [015] .....11 25722056709964 (+592): run_timer_softirq <-do_current_softirqs
ksoftirqd/15-265   [015] ....111 25722056714034 (+4070): timer_expire_entry: timer=ffff88082f6f14a0 function=delayed_work_timer_fn now=4480299175
ksoftirqd/15-265   [015] ....112 25722056715738 (+1704):
workqueue_queue_work: work struct=ffff88082f6f1480 function=vmstat_update workqueue=ffff88041f408000 req_cpu=5120 cpu=15
ksoftirqd/15-265   [015] ....112 25722056716304 (+566): workqueue_activate_work: work struct ffff88082f6f1480
ksoftirqd/15-265   [015] ....111 25722056719052 (+2748): timer_expire_exit: timer=ffff88082f6f14a0
ksoftirqd/15-265   [015] ....111 25722056719384 (+332): softirq_exit: vec=1 [action=TIMER]

Which should only be processed once there are actual add_timer timers
being fired (there are no such add_timer timers on this CPU).

Does that make sense?



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel/timer: avoid spurious ksoftirqd wakeups
  2015-04-06 23:51       ` Marcelo Tosatti
@ 2015-04-07 20:17         ` Frederic Weisbecker
  2015-04-07 22:29           ` Marcelo Tosatti
  0 siblings, 1 reply; 11+ messages in thread
From: Frederic Weisbecker @ 2015-04-07 20:17 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, Rik van Riel, Thomas Gleixner, Paul E. McKenney,
	Viresh Kumar

On Mon, Apr 06, 2015 at 08:51:26PM -0300, Marcelo Tosatti wrote:
> On Tue, Apr 07, 2015 at 01:34:15AM +0200, Frederic Weisbecker wrote:
> > Yeah, it would be nice to make sure that the cause of these softirqs isn't
> > mistakenly ignored. 
> > And also I want to be sure we really understand what we
> > are doing, which is not the case right now as we don't know what is causing
> > this expired timer.
> 
> What is the interrupt that is the cause for tick_nohz_stop_sched_tick,
> you mean? 
> 
>        <...>-45815 [015] d...2.. 25722056692012 (+961446): kvm_exit: reason EXTERNAL_INTERRUPT rip 0x7f5e448479d0 info 0 800000ef
>        <...>-45815 [015] d..h1.. 25722056692844 (+832): apic_timer_fn<-__run_hrtimer
>        <...>-45815 [015] d...1.. 25722056695442 (+2598): raise_softirq_irqoff <-tick_nohz_stop_sched_tick
> 
> Emulation of guest APIC timer by hrtimer (apic_timer_fn).

Nope, I meant what is the root cause of the softirq.
But lets continue on that below:

> > Sure, but why is it waking up exactly?
> 
> Because there is a bug (the patch is trying to fix the bug by 
> raising timer softirq only when timer softirq handler has any 
> work to do).
> 
> The only timers pending in the timer list are deferred ones
> from vmstat_update:
> 
> ksoftirqd/15-265   [015] ....111 25722056709372 (+7098): softirq_entry: vec=1 [action=TIMER]
> ksoftirqd/15-265   [015] .....11 25722056709964 (+592): run_timer_softirq <-do_current_softirqs
> ksoftirqd/15-265   [015] ....111 25722056714034 (+4070): timer_expire_entry: timer=ffff88082f6f14a0 function=delayed_work_timer_fn now=4480299175
> ksoftirqd/15-265   [015] ....112 25722056715738 (+1704):
> workqueue_queue_work: work struct=ffff88082f6f1480 function=vmstat_update workqueue=ffff88041f408000 req_cpu=5120 cpu=15
> ksoftirqd/15-265   [015] ....112 25722056716304 (+566): workqueue_activate_work: work struct ffff88082f6f1480
> ksoftirqd/15-265   [015] ....111 25722056719052 (+2748): timer_expire_exit: timer=ffff88082f6f14a0
> ksoftirqd/15-265   [015] ....111 25722056719384 (+332): softirq_exit: vec=1 [action=TIMER]
> 
> Which should only be processed once there are actual add_timer timers
> being fired (there are no such add_timer timers on this CPU).
> 
> Does that make sense?

So the source of these softirqs is those deffered timers? But defferable timers
are only defferable in idle-nohz mode, not full-nohz. They are actually deffered
in practice in full-nohz but it's a bug :o)  (which I need to fix).

Still, I don't think this is the source of the softirqs since your patch fixes
the issue of non-timers triggering softirqs.

So here is the issue: something that is not a "struct timer_list" is causing the
expiry time of the next tick to be in the past or now. See tick_nohz_stop_sched_tick(),
the softirq is triggered when delta_jiffies < 1 or when the timer fails to be reprogrammed
because it has already expired.

What can cause this expiry time to be now or in the past? Well for that we need to
check everything that is used to evaluate the next tick:

1) struct timer_list Timers
2) low-res hrtimers
3) scheduler_tick_max_deferment
4) timekeeping_max_deferment
5) (rcu|arch|irq_work)_needs_tick()
6) maybe something else I'm missing

Your patch has reduced the softirq to only be triggered in case 1) and it works
for you. This means the spurious softirqs that you saw were caused by 2,3,4,5 or 6.
I want to know which one and why because I need to understand exactly which event
is going to not trigger a softirq anymore after this patch. We want know that to 
ensure there is no side effect after your patch.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel/timer: avoid spurious ksoftirqd wakeups
  2015-04-07 20:17         ` Frederic Weisbecker
@ 2015-04-07 22:29           ` Marcelo Tosatti
  0 siblings, 0 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2015-04-07 22:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, Rik van Riel, Thomas Gleixner, Paul E. McKenney,
	Viresh Kumar, Luiz Capitulino

On Tue, Apr 07, 2015 at 10:17:23PM +0200, Frederic Weisbecker wrote:
> On Mon, Apr 06, 2015 at 08:51:26PM -0300, Marcelo Tosatti wrote:
> > On Tue, Apr 07, 2015 at 01:34:15AM +0200, Frederic Weisbecker wrote:
> > > Yeah, it would be nice to make sure that the cause of these softirqs isn't
> > > mistakenly ignored. 
> > > And also I want to be sure we really understand what we
> > > are doing, which is not the case right now as we don't know what is causing
> > > this expired timer.
> > 
> > What is the interrupt that is the cause for tick_nohz_stop_sched_tick,
> > you mean? 
> > 
> >        <...>-45815 [015] d...2.. 25722056692012 (+961446): kvm_exit: reason EXTERNAL_INTERRUPT rip 0x7f5e448479d0 info 0 800000ef
> >        <...>-45815 [015] d..h1.. 25722056692844 (+832): apic_timer_fn<-__run_hrtimer
> >        <...>-45815 [015] d...1.. 25722056695442 (+2598): raise_softirq_irqoff <-tick_nohz_stop_sched_tick
> > 
> > Emulation of guest APIC timer by hrtimer (apic_timer_fn).
> 
> Nope, I meant what is the root cause of the softirq.
> But lets continue on that below:
> 
> > > Sure, but why is it waking up exactly?
> > 
> > Because there is a bug (the patch is trying to fix the bug by 
> > raising timer softirq only when timer softirq handler has any 
> > work to do).
> > 
> > The only timers pending in the timer list are deferred ones
> > from vmstat_update:
> > 
> > ksoftirqd/15-265   [015] ....111 25722056709372 (+7098): softirq_entry: vec=1 [action=TIMER]
> > ksoftirqd/15-265   [015] .....11 25722056709964 (+592): run_timer_softirq <-do_current_softirqs
> > ksoftirqd/15-265   [015] ....111 25722056714034 (+4070): timer_expire_entry: timer=ffff88082f6f14a0 function=delayed_work_timer_fn now=4480299175
> > ksoftirqd/15-265   [015] ....112 25722056715738 (+1704):
> > workqueue_queue_work: work struct=ffff88082f6f1480 function=vmstat_update workqueue=ffff88041f408000 req_cpu=5120 cpu=15
> > ksoftirqd/15-265   [015] ....112 25722056716304 (+566): workqueue_activate_work: work struct ffff88082f6f1480
> > ksoftirqd/15-265   [015] ....111 25722056719052 (+2748): timer_expire_exit: timer=ffff88082f6f14a0
> > ksoftirqd/15-265   [015] ....111 25722056719384 (+332): softirq_exit: vec=1 [action=TIMER]
> > 
> > Which should only be processed once there are actual add_timer timers
> > being fired (there are no such add_timer timers on this CPU).
> > 
> > Does that make sense?
> 
> So the source of these softirqs is those deffered timers? But defferable timers
> are only defferable in idle-nohz mode, not full-nohz. They are actually deffered
> in practice in full-nohz but it's a bug :o)  (which I need to fix).
> 
> Still, I don't think this is the source of the softirqs since your patch fixes
> the issue of non-timers triggering softirqs.
> 
> So here is the issue: something that is not a "struct timer_list" is causing the
> expiry time of the next tick to be in the past or now. See tick_nohz_stop_sched_tick(),
> the softirq is triggered when delta_jiffies < 1 

delta_jiffies = NEXT_TIMER_MAX_DELTA.

tick_nohz_stop_sched_tick:     delta_jiffies: 1073741823 rcu_delta_jiffies: 18446744073709551615 tick_stopped: 1

> or when the timer fails to be reprogrammed
> because it has already expired.

Right, missed that. I'll ask Luiz to gather info on why its 
failing.

> 
> What can cause this expiry time to be now or in the past? Well for that we need to
> check everything that is used to evaluate the next tick:
> 
> 1) struct timer_list Timers
> 2) low-res hrtimers
> 3) scheduler_tick_max_deferment
> 4) timekeeping_max_deferment
> 5) (rcu|arch|irq_work)_needs_tick()
> 6) maybe something else I'm missing
> 
> Your patch has reduced the softirq to only be triggered in case 1) and it works
> for you. This means the spurious softirqs that you saw were caused by 2,3,4,5 or 6.
> I want to know which one and why because I need to understand exactly which event
> is going to not trigger a softirq anymore after this patch. We want know that to 
> ensure there is no side effect after your patch.
> 
> Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel/timer: avoid spurious ksoftirqd wakeups
  2015-04-02  3:32 ` Hillf Danton
@ 2015-04-02 20:29   ` Marcelo Tosatti
  0 siblings, 0 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2015-04-02 20:29 UTC (permalink / raw)
  To: Hillf Danton; +Cc: 'Luiz Capitulino', linux-kernel

On Thu, Apr 02, 2015 at 11:32:49AM +0800, Hillf Danton wrote:
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
> >  	unsigned long rcu_delta_jiffies;
> >  	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
> >  	u64 time_delta;
> > +	bool raise_softirq;
> > 
> s/raise_softirq/raise_softirq = false/ ?

Not necessary, variable is initialized before being used.

> >  	time_delta = timekeeping_max_deferment();
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: kernel/timer: avoid spurious ksoftirqd wakeups
       [not found] <042301d06cf5$2ec7ae90$8c570bb0$@alibaba-inc.com>
@ 2015-04-02  3:32 ` Hillf Danton
  2015-04-02 20:29   ` Marcelo Tosatti
  0 siblings, 1 reply; 11+ messages in thread
From: Hillf Danton @ 2015-04-02  3:32 UTC (permalink / raw)
  To: 'Marcelo Tosatti'
  Cc: 'Luiz Capitulino', linux-kernel, Hillf Danton

> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -568,6 +568,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>  	unsigned long rcu_delta_jiffies;
>  	struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
>  	u64 time_delta;
> +	bool raise_softirq;
> 
s/raise_softirq/raise_softirq = false/ ?
>  	time_delta = timekeeping_max_deferment();
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-04-07 22:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-02  1:44 kernel/timer: avoid spurious ksoftirqd wakeups Marcelo Tosatti
2015-04-02 13:58 ` Rik van Riel
2015-04-02 20:53   ` Marcelo Tosatti
2015-04-02 14:59 ` Frederic Weisbecker
2015-04-02 21:08   ` Marcelo Tosatti
2015-04-06 23:34     ` Frederic Weisbecker
2015-04-06 23:51       ` Marcelo Tosatti
2015-04-07 20:17         ` Frederic Weisbecker
2015-04-07 22:29           ` Marcelo Tosatti
     [not found] <042301d06cf5$2ec7ae90$8c570bb0$@alibaba-inc.com>
2015-04-02  3:32 ` Hillf Danton
2015-04-02 20:29   ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).