LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/2] sched/fair: Improve cost accounting of newidle_balance
@ 2021-10-04 17:14 Vincent Guittot
  2021-10-04 17:14 ` [PATCH 1/2] sched/fair: account update_blocked_averages in newidle_balance cost Vincent Guittot
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Vincent Guittot @ 2021-10-04 17:14 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, tim.c.chen
  Cc: Vincent Guittot

Improve the precision of the cost of newidle_balance by taking into account
the time spent in updating blocked load and skipping load balance loop 
entirely if there is no chance to run at least one.

Vincent Guittot (2):
  sched/fair: account update_blocked_averages in newidle_balance cost
  sched/fair: Skip update_blocked_averages if we are defering load
    balance

 kernel/sched/fair.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] sched/fair: account update_blocked_averages in newidle_balance cost
  2021-10-04 17:14 [PATCH 0/2] sched/fair: Improve cost accounting of newidle_balance Vincent Guittot
@ 2021-10-04 17:14 ` Vincent Guittot
  2021-10-05 20:40   ` Peter Zijlstra
  2021-10-04 17:14 ` [PATCH 2/2] sched/fair: Skip update_blocked_averages if we are defering load balance Vincent Guittot
  2021-10-04 23:06 ` [PATCH 0/2] sched/fair: Improve cost accounting of newidle_balance Tim Chen
  2 siblings, 1 reply; 9+ messages in thread
From: Vincent Guittot @ 2021-10-04 17:14 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, tim.c.chen
  Cc: Vincent Guittot

The time spent to update the blocked load can be significant depending of
the complexity fo the cgroup hierarchy. Take this time into account when
deciding to stop newidle_balance() because it exceeds the expected idle
time.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8943dbb94365..1f78b2e3b71c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10810,7 +10810,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 	int this_cpu = this_rq->cpu;
 	struct sched_domain *sd;
 	int pulled_task = 0;
-	u64 curr_cost = 0;
+	u64 t0, domain_cost, curr_cost = 0;
 
 	update_misfit_status(NULL, this_rq);
 
@@ -10855,11 +10855,14 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 
 	raw_spin_rq_unlock(this_rq);
 
+	t0 = sched_clock_cpu(this_cpu);
 	update_blocked_averages(this_cpu);
+	domain_cost = sched_clock_cpu(this_cpu) - t0;
+	curr_cost += domain_cost;
+
 	rcu_read_lock();
 	for_each_domain(this_cpu, sd) {
 		int continue_balancing = 1;
-		u64 t0, domain_cost;
 
 		if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
 			update_next_balance(sd, &next_balance);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 2/2] sched/fair: Skip update_blocked_averages if we are defering load balance
  2021-10-04 17:14 [PATCH 0/2] sched/fair: Improve cost accounting of newidle_balance Vincent Guittot
  2021-10-04 17:14 ` [PATCH 1/2] sched/fair: account update_blocked_averages in newidle_balance cost Vincent Guittot
@ 2021-10-04 17:14 ` Vincent Guittot
  2021-10-05 20:49   ` Peter Zijlstra
  2021-10-04 23:06 ` [PATCH 0/2] sched/fair: Improve cost accounting of newidle_balance Tim Chen
  2 siblings, 1 reply; 9+ messages in thread
From: Vincent Guittot @ 2021-10-04 17:14 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, bristot, linux-kernel, tim.c.chen
  Cc: Vincent Guittot

In newidle_balance(), the scheduler skips load balance to the new idle cpu
when the 1st sd of this_rq is:

   this_rq->avg_idle < sd->max_newidle_lb_cost

Doing a costly call to update_blocked_averages() will not be useful and
simply adds overhead when this condition is true.

Check the condition early in newidle_balance() to skip
update_blocked_averages() when possible.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1f78b2e3b71c..1294b78503d9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10841,17 +10841,20 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
 	 */
 	rq_unpin_lock(this_rq, rf);
 
+	rcu_read_lock();
+	sd = rcu_dereference_check_sched_domain(this_rq->sd);
+
 	if (this_rq->avg_idle < sysctl_sched_migration_cost ||
-	    !READ_ONCE(this_rq->rd->overload)) {
+		!READ_ONCE(this_rq->rd->overload) ||
+		(sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) {
 
-		rcu_read_lock();
-		sd = rcu_dereference_check_sched_domain(this_rq->sd);
 		if (sd)
 			update_next_balance(sd, &next_balance);
 		rcu_read_unlock();
 
 		goto out;
 	}
+	rcu_read_unlock();
 
 	raw_spin_rq_unlock(this_rq);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] sched/fair: Improve cost accounting of newidle_balance
  2021-10-04 17:14 [PATCH 0/2] sched/fair: Improve cost accounting of newidle_balance Vincent Guittot
  2021-10-04 17:14 ` [PATCH 1/2] sched/fair: account update_blocked_averages in newidle_balance cost Vincent Guittot
  2021-10-04 17:14 ` [PATCH 2/2] sched/fair: Skip update_blocked_averages if we are defering load balance Vincent Guittot
@ 2021-10-04 23:06 ` Tim Chen
  2 siblings, 0 replies; 9+ messages in thread
From: Tim Chen @ 2021-10-04 23:06 UTC (permalink / raw)
  To: Vincent Guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, linux-kernel

On Mon, 2021-10-04 at 19:14 +0200, Vincent Guittot wrote:
> Improve the precision of the cost of newidle_balance by taking into
> account
> the time spent in updating blocked load and skipping load balance
> loop 
> entirely if there is no chance to run at least one.
> 

Thanks Vincent.

I'll ask our benchmark team to give it a spin.  They've been fairly
busy. I hope to get some bandwidth from them in next couple of
weeks.

Tim

> Vincent Guittot (2):
>   sched/fair: account update_blocked_averages in newidle_balance cost
>   sched/fair: Skip update_blocked_averages if we are defering load
>     balance
> 
>  kernel/sched/fair.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] sched/fair: account update_blocked_averages in newidle_balance cost
  2021-10-04 17:14 ` [PATCH 1/2] sched/fair: account update_blocked_averages in newidle_balance cost Vincent Guittot
@ 2021-10-05 20:40   ` Peter Zijlstra
  2021-10-06  7:52     ` Vincent Guittot
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2021-10-05 20:40 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, linux-kernel, tim.c.chen

On Mon, Oct 04, 2021 at 07:14:50PM +0200, Vincent Guittot wrote:
> The time spent to update the blocked load can be significant depending of
> the complexity fo the cgroup hierarchy. Take this time into account when
> deciding to stop newidle_balance() because it exceeds the expected idle
> time.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8943dbb94365..1f78b2e3b71c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10810,7 +10810,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
>  	int this_cpu = this_rq->cpu;
>  	struct sched_domain *sd;
>  	int pulled_task = 0;
> -	u64 curr_cost = 0;
> +	u64 t0, domain_cost, curr_cost = 0;
>  
>  	update_misfit_status(NULL, this_rq);
>  
> @@ -10855,11 +10855,14 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
>  
>  	raw_spin_rq_unlock(this_rq);
>  
> +	t0 = sched_clock_cpu(this_cpu);
>  	update_blocked_averages(this_cpu);
> +	domain_cost = sched_clock_cpu(this_cpu) - t0;
> +	curr_cost += domain_cost;
> +
>  	rcu_read_lock();
>  	for_each_domain(this_cpu, sd) {
>  		int continue_balancing = 1;
> -		u64 t0, domain_cost;
>  
>  		if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
>  			update_next_balance(sd, &next_balance);

Does this make sense? It avoids a bunch of clock calls (and thereby
accounts more actual time).

Also, perhaps we should some asymmetric IIR instead of a strict MAX
filter for max_newidle_lb_cost.

---
Index: linux-2.6/kernel/sched/fair.c
===================================================================
--- linux-2.6.orig/kernel/sched/fair.c
+++ linux-2.6/kernel/sched/fair.c
@@ -10759,9 +10759,9 @@ static int newidle_balance(struct rq *th
 {
 	unsigned long next_balance = jiffies + HZ;
 	int this_cpu = this_rq->cpu;
+	u64 t0, t1, curr_cost = 0;
 	struct sched_domain *sd;
 	int pulled_task = 0;
-	u64 t0, domain_cost, curr_cost = 0;
 
 	update_misfit_status(NULL, this_rq);
 
@@ -10808,8 +10808,9 @@ static int newidle_balance(struct rq *th
 
 	t0 = sched_clock_cpu(this_cpu);
 	update_blocked_averages(this_cpu);
-	domain_cost = sched_clock_cpu(this_cpu) - t0;
-	curr_cost += domain_cost;
+	t1 = sched_clock_cpu(this_cpu);
+	curr_cost += t1 - t0;
+	t0 = t1;
 
 	rcu_read_lock();
 	for_each_domain(this_cpu, sd) {
@@ -10821,17 +10822,19 @@ static int newidle_balance(struct rq *th
 		}
 
 		if (sd->flags & SD_BALANCE_NEWIDLE) {
-			t0 = sched_clock_cpu(this_cpu);
+			u64 domain_cost;
 
 			pulled_task = load_balance(this_cpu, this_rq,
 						   sd, CPU_NEWLY_IDLE,
 						   &continue_balancing);
 
-			domain_cost = sched_clock_cpu(this_cpu) - t0;
+			t1 = sched_clock_cpu(this_cpu);
+			domain_cost = t1 - t0;
 			if (domain_cost > sd->max_newidle_lb_cost)
 				sd->max_newidle_lb_cost = domain_cost;
 
 			curr_cost += domain_cost;
+			t0 = t1;
 		}
 
 		update_next_balance(sd, &next_balance);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] sched/fair: Skip update_blocked_averages if we are defering load balance
  2021-10-04 17:14 ` [PATCH 2/2] sched/fair: Skip update_blocked_averages if we are defering load balance Vincent Guittot
@ 2021-10-05 20:49   ` Peter Zijlstra
  2021-10-06  8:12     ` Vincent Guittot
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2021-10-05 20:49 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
	bristot, linux-kernel, tim.c.chen

On Mon, Oct 04, 2021 at 07:14:51PM +0200, Vincent Guittot wrote:
> In newidle_balance(), the scheduler skips load balance to the new idle cpu
> when the 1st sd of this_rq is:
> 
>    this_rq->avg_idle < sd->max_newidle_lb_cost
> 
> Doing a costly call to update_blocked_averages() will not be useful and
> simply adds overhead when this condition is true.
> 
> Check the condition early in newidle_balance() to skip
> update_blocked_averages() when possible.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  kernel/sched/fair.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1f78b2e3b71c..1294b78503d9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10841,17 +10841,20 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
>  	 */
>  	rq_unpin_lock(this_rq, rf);
>  
> +	rcu_read_lock();
> +	sd = rcu_dereference_check_sched_domain(this_rq->sd);
> +
>  	if (this_rq->avg_idle < sysctl_sched_migration_cost ||
> -	    !READ_ONCE(this_rq->rd->overload)) {
> +		!READ_ONCE(this_rq->rd->overload) ||
> +		(sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) {

set cino=(0:0, please.

Also, people have, in the past, tried to get rid of the first clause
here, perhaps this can replace it instead of augment it?

>  
> -		rcu_read_lock();
> -		sd = rcu_dereference_check_sched_domain(this_rq->sd);
>  		if (sd)
>  			update_next_balance(sd, &next_balance);
>  		rcu_read_unlock();
>  
>  		goto out;
>  	}
> +	rcu_read_unlock();

There's another rcu_read_lock section right below this, at the very
least we can merge them.

Also, IIRC we're running all this with premption disabled, and since
rcu-sched got folded into rcu, all that rcu_read_*lock() stuff isn't
strictly required anymore.

(we're full circle there, back in the day RCU implied RCU-sched and the
scheduler relied on preempt-disable for lots of this stuff, then Paul
split them, and I spend a fair amount of time adding all this
rcu_read_*lock() crud, and now he's merge them again, and it can go
again).

Except of course, I think we need to make rcu_dereference_check happy
first :/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] sched/fair: account update_blocked_averages in newidle_balance cost
  2021-10-05 20:40   ` Peter Zijlstra
@ 2021-10-06  7:52     ` Vincent Guittot
  2021-10-06  8:16       ` Vincent Guittot
  0 siblings, 1 reply; 9+ messages in thread
From: Vincent Guittot @ 2021-10-06  7:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Juri Lelli, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-kernel,
	Tim Chen

On Tue, 5 Oct 2021 at 22:41, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Oct 04, 2021 at 07:14:50PM +0200, Vincent Guittot wrote:
> > The time spent to update the blocked load can be significant depending of
> > the complexity fo the cgroup hierarchy. Take this time into account when
> > deciding to stop newidle_balance() because it exceeds the expected idle
> > time.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> >  kernel/sched/fair.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 8943dbb94365..1f78b2e3b71c 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10810,7 +10810,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> >       int this_cpu = this_rq->cpu;
> >       struct sched_domain *sd;
> >       int pulled_task = 0;
> > -     u64 curr_cost = 0;
> > +     u64 t0, domain_cost, curr_cost = 0;
> >
> >       update_misfit_status(NULL, this_rq);
> >
> > @@ -10855,11 +10855,14 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> >
> >       raw_spin_rq_unlock(this_rq);
> >
> > +     t0 = sched_clock_cpu(this_cpu);
> >       update_blocked_averages(this_cpu);
> > +     domain_cost = sched_clock_cpu(this_cpu) - t0;
> > +     curr_cost += domain_cost;
> > +
> >       rcu_read_lock();
> >       for_each_domain(this_cpu, sd) {
> >               int continue_balancing = 1;
> > -             u64 t0, domain_cost;
> >
> >               if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
> >                       update_next_balance(sd, &next_balance);
>
> Does this make sense? It avoids a bunch of clock calls (and thereby
> accounts more actual time).

Originally, I didn't want to modify the current accounting of
sched_domain but only account the sometime large
update_blocked_averages(). but i agree that we can ensure to account
more actual time
>
> Also, perhaps we should some asymmetric IIR instead of a strict MAX
> filter for max_newidle_lb_cost.

Ok. I'm going to look at this and see how all this goes

>
> ---
> Index: linux-2.6/kernel/sched/fair.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched/fair.c
> +++ linux-2.6/kernel/sched/fair.c
> @@ -10759,9 +10759,9 @@ static int newidle_balance(struct rq *th
>  {
>         unsigned long next_balance = jiffies + HZ;
>         int this_cpu = this_rq->cpu;
> +       u64 t0, t1, curr_cost = 0;
>         struct sched_domain *sd;
>         int pulled_task = 0;
> -       u64 t0, domain_cost, curr_cost = 0;
>
>         update_misfit_status(NULL, this_rq);
>
> @@ -10808,8 +10808,9 @@ static int newidle_balance(struct rq *th
>
>         t0 = sched_clock_cpu(this_cpu);
>         update_blocked_averages(this_cpu);
> -       domain_cost = sched_clock_cpu(this_cpu) - t0;
> -       curr_cost += domain_cost;
> +       t1 = sched_clock_cpu(this_cpu);
> +       curr_cost += t1 - t0;
> +       t0 = t1;
>
>         rcu_read_lock();
>         for_each_domain(this_cpu, sd) {
> @@ -10821,17 +10822,19 @@ static int newidle_balance(struct rq *th
>                 }
>
>                 if (sd->flags & SD_BALANCE_NEWIDLE) {
> -                       t0 = sched_clock_cpu(this_cpu);
> +                       u64 domain_cost;
>
>                         pulled_task = load_balance(this_cpu, this_rq,
>                                                    sd, CPU_NEWLY_IDLE,
>                                                    &continue_balancing);
>
> -                       domain_cost = sched_clock_cpu(this_cpu) - t0;
> +                       t1 = sched_clock_cpu(this_cpu);
> +                       domain_cost = t1 - t0;
>                         if (domain_cost > sd->max_newidle_lb_cost)
>                                 sd->max_newidle_lb_cost = domain_cost;
>
>                         curr_cost += domain_cost;
> +                       t0 = t1;
>                 }
>
>                 update_next_balance(sd, &next_balance);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] sched/fair: Skip update_blocked_averages if we are defering load balance
  2021-10-05 20:49   ` Peter Zijlstra
@ 2021-10-06  8:12     ` Vincent Guittot
  0 siblings, 0 replies; 9+ messages in thread
From: Vincent Guittot @ 2021-10-06  8:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Juri Lelli, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-kernel,
	Tim Chen

On Tue, 5 Oct 2021 at 22:49, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Oct 04, 2021 at 07:14:51PM +0200, Vincent Guittot wrote:
> > In newidle_balance(), the scheduler skips load balance to the new idle cpu
> > when the 1st sd of this_rq is:
> >
> >    this_rq->avg_idle < sd->max_newidle_lb_cost
> >
> > Doing a costly call to update_blocked_averages() will not be useful and
> > simply adds overhead when this condition is true.
> >
> > Check the condition early in newidle_balance() to skip
> > update_blocked_averages() when possible.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > ---
> >  kernel/sched/fair.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 1f78b2e3b71c..1294b78503d9 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10841,17 +10841,20 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> >        */
> >       rq_unpin_lock(this_rq, rf);
> >
> > +     rcu_read_lock();
> > +     sd = rcu_dereference_check_sched_domain(this_rq->sd);
> > +
> >       if (this_rq->avg_idle < sysctl_sched_migration_cost ||
> > -         !READ_ONCE(this_rq->rd->overload)) {
> > +             !READ_ONCE(this_rq->rd->overload) ||
> > +             (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) {
>
> set cino=(0:0, please.
>
> Also, people have, in the past, tried to get rid of the first clause
> here, perhaps this can replace it instead of augment it?

Yes, that's a good point.
either sd->max_newidle_lb_cost >= sysctl_sched_migration_cost and the
current condition is covered by the new condition
or sd->max_newidle_lb_cost < sysctl_sched_migration_cost and we will
run newidle balance. But this also means that we have time to run the
newly idle LB

>
> >
> > -             rcu_read_lock();
> > -             sd = rcu_dereference_check_sched_domain(this_rq->sd);
> >               if (sd)
> >                       update_next_balance(sd, &next_balance);
> >               rcu_read_unlock();
> >
> >               goto out;
> >       }
> > +     rcu_read_unlock();
>
> There's another rcu_read_lock section right below this, at the very
> least we can merge them.

we release the rq lock in between which adds more conditions to check
to all situations

>
> Also, IIRC we're running all this with premption disabled, and since
> rcu-sched got folded into rcu, all that rcu_read_*lock() stuff isn't
> strictly required anymore.
>
> (we're full circle there, back in the day RCU implied RCU-sched and the
> scheduler relied on preempt-disable for lots of this stuff, then Paul
> split them, and I spend a fair amount of time adding all this
> rcu_read_*lock() crud, and now he's merge them again, and it can go
> again).
>
> Except of course, I think we need to make rcu_dereference_check happy
> first :/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] sched/fair: account update_blocked_averages in newidle_balance cost
  2021-10-06  7:52     ` Vincent Guittot
@ 2021-10-06  8:16       ` Vincent Guittot
  0 siblings, 0 replies; 9+ messages in thread
From: Vincent Guittot @ 2021-10-06  8:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Juri Lelli, Dietmar Eggemann, Steven Rostedt,
	Ben Segall, Mel Gorman, Daniel Bristot de Oliveira, linux-kernel,
	Tim Chen

On Wed, 6 Oct 2021 at 09:52, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>
> On Tue, 5 Oct 2021 at 22:41, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Oct 04, 2021 at 07:14:50PM +0200, Vincent Guittot wrote:
> > > The time spent to update the blocked load can be significant depending of
> > > the complexity fo the cgroup hierarchy. Take this time into account when
> > > deciding to stop newidle_balance() because it exceeds the expected idle
> > > time.
> > >
> > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > ---
> > >  kernel/sched/fair.c | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 8943dbb94365..1f78b2e3b71c 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -10810,7 +10810,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> > >       int this_cpu = this_rq->cpu;
> > >       struct sched_domain *sd;
> > >       int pulled_task = 0;
> > > -     u64 curr_cost = 0;
> > > +     u64 t0, domain_cost, curr_cost = 0;
> > >
> > >       update_misfit_status(NULL, this_rq);
> > >
> > > @@ -10855,11 +10855,14 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> > >
> > >       raw_spin_rq_unlock(this_rq);
> > >
> > > +     t0 = sched_clock_cpu(this_cpu);
> > >       update_blocked_averages(this_cpu);
> > > +     domain_cost = sched_clock_cpu(this_cpu) - t0;
> > > +     curr_cost += domain_cost;
> > > +
> > >       rcu_read_lock();
> > >       for_each_domain(this_cpu, sd) {
> > >               int continue_balancing = 1;
> > > -             u64 t0, domain_cost;
> > >
> > >               if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
> > >                       update_next_balance(sd, &next_balance);
> >
> > Does this make sense? It avoids a bunch of clock calls (and thereby
> > accounts more actual time).
>
> Originally, I didn't want to modify the current accounting of
> sched_domain but only account the sometime large
> update_blocked_averages(). but i agree that we can ensure to account
> more actual time
> >
> > Also, perhaps we should some asymmetric IIR instead of a strict MAX
> > filter for max_newidle_lb_cost.
>
> Ok. I'm going to look at this and see how all this goes
>
> >
> > ---
> > Index: linux-2.6/kernel/sched/fair.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/sched/fair.c
> > +++ linux-2.6/kernel/sched/fair.c
> > @@ -10759,9 +10759,9 @@ static int newidle_balance(struct rq *th
> >  {
> >         unsigned long next_balance = jiffies + HZ;
> >         int this_cpu = this_rq->cpu;
> > +       u64 t0, t1, curr_cost = 0;
> >         struct sched_domain *sd;
> >         int pulled_task = 0;
> > -       u64 t0, domain_cost, curr_cost = 0;
> >
> >         update_misfit_status(NULL, this_rq);
> >
> > @@ -10808,8 +10808,9 @@ static int newidle_balance(struct rq *th
> >
> >         t0 = sched_clock_cpu(this_cpu);
> >         update_blocked_averages(this_cpu);
> > -       domain_cost = sched_clock_cpu(this_cpu) - t0;

I wonder if we should not include the duration of
update_blocked_averages() in the 1st domain cost ?
To make sure that we will not update it but finally abort before
running the 1st domain because there is not enough remaining time

> > -       curr_cost += domain_cost;
> > +       t1 = sched_clock_cpu(this_cpu);
> > +       curr_cost += t1 - t0;
> > +       t0 = t1;
> >
> >         rcu_read_lock();
> >         for_each_domain(this_cpu, sd) {
> > @@ -10821,17 +10822,19 @@ static int newidle_balance(struct rq *th
> >                 }
> >
> >                 if (sd->flags & SD_BALANCE_NEWIDLE) {
> > -                       t0 = sched_clock_cpu(this_cpu);
> > +                       u64 domain_cost;
> >
> >                         pulled_task = load_balance(this_cpu, this_rq,
> >                                                    sd, CPU_NEWLY_IDLE,
> >                                                    &continue_balancing);
> >
> > -                       domain_cost = sched_clock_cpu(this_cpu) - t0;
> > +                       t1 = sched_clock_cpu(this_cpu);
> > +                       domain_cost = t1 - t0;
> >                         if (domain_cost > sd->max_newidle_lb_cost)
> >                                 sd->max_newidle_lb_cost = domain_cost;
> >
> >                         curr_cost += domain_cost;
> > +                       t0 = t1;
> >                 }
> >
> >                 update_next_balance(sd, &next_balance);

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-10-06  8:17 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-04 17:14 [PATCH 0/2] sched/fair: Improve cost accounting of newidle_balance Vincent Guittot
2021-10-04 17:14 ` [PATCH 1/2] sched/fair: account update_blocked_averages in newidle_balance cost Vincent Guittot
2021-10-05 20:40   ` Peter Zijlstra
2021-10-06  7:52     ` Vincent Guittot
2021-10-06  8:16       ` Vincent Guittot
2021-10-04 17:14 ` [PATCH 2/2] sched/fair: Skip update_blocked_averages if we are defering load balance Vincent Guittot
2021-10-05 20:49   ` Peter Zijlstra
2021-10-06  8:12     ` Vincent Guittot
2021-10-04 23:06 ` [PATCH 0/2] sched/fair: Improve cost accounting of newidle_balance Tim Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).