LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
To: Venkatesh Pallipadi <venki@google.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org, Paul Turner <pjt@google.com>,
	Mike Galbraith <efault@gmx.de>, Nick Piggin <npiggin@gmail.com>,
	Tim Chen <tim.c.chen@intel.com>, Alex Shi <alex.shi@intel.com>
Subject: Re: [PATCH] sched: Wholesale removal of sd_idle logic
Date: Tue, 15 Feb 2011 22:31:27 +0530	[thread overview]
Message-ID: <20110215170127.GA28865@dirshya.in.ibm.com> (raw)
In-Reply-To: <1297723130-693-1-git-send-email-venki@google.com>

* Venkatesh Pallipadi <venki@google.com> [2011-02-14 14:38:50]:

> sd_idle logic was introduced way back in 2005 (commit 5969fe06),
> as an HT optimization.
> 
> As per the discussion in the thread here
> lkml subject - sched: Resolve sd_idle and first_idle_cpu Catch-22 - v1
> https://patchwork.kernel.org/patch/532501/
> 
> the capacity based logic in the load balancer right now handles this
> in a much cleaner way, handling more than 2 SMT siblings etc, and sd_idle
> does not seem to bring any adiitional benefits. sd_idle logic also has
> some bugs that has performance impact. Here is the patch that removes
> the sd_idle logic altogether.
> 
> The initial patch here - https://patchwork.kernel.org/patch/532501/
> applies cleanly over the below change and provides a micro-optimization
> for a specific case, where an idle core can pull tasks instead of a
> core with one thread being idle and other thread being busy.
> It will be good to get some data on whether this micro-optimization
> matters or not.
> 
> Also, there was a dependency of sched_mc_power_savings == 2, with sd_idle
> logic. Copying Vaidy to know the impact of this change there.

Hi Venki,

The dependency is to avoid active balancing when there is a busy
sibling and power save balance is not set.

Another logic would propagate/force sd_idle=1 to induce more frequent
balancing for idle sibling in case of power save balance.  Removing
sd_idle will make this default.

Your changes look good.  I will test and report.

> Signed-off-by: Venkatesh Pallipadi <venki@google.com>

Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>

> ---
>  kernel/sched_fair.c |   53 ++++++++++----------------------------------------
>  1 files changed, 11 insertions(+), 42 deletions(-)
> 
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 0c26e2d..932dc13 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -2610,7 +2610,6 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group)
>   * @this_cpu: Cpu for which load balance is currently performed.
>   * @idle: Idle status of this_cpu
>   * @load_idx: Load index of sched_domain of this_cpu for load calc.
> - * @sd_idle: Idle status of the sched_domain containing group.
>   * @local_group: Does group contain this_cpu.
>   * @cpus: Set of cpus considered for load balancing.
>   * @balance: Should we balance.
> @@ -2618,7 +2617,7 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group)
>   */
>  static inline void update_sg_lb_stats(struct sched_domain *sd,
>  			struct sched_group *group, int this_cpu,
> -			enum cpu_idle_type idle, int load_idx, int *sd_idle,
> +			enum cpu_idle_type idle, int load_idx,
>  			int local_group, const struct cpumask *cpus,
>  			int *balance, struct sg_lb_stats *sgs)
>  {
> @@ -2638,9 +2637,6 @@ static inline void update_sg_lb_stats(struct sched_domain *sd,
>  	for_each_cpu_and(i, sched_group_cpus(group), cpus) {
>  		struct rq *rq = cpu_rq(i);
> 
> -		if (*sd_idle && rq->nr_running)
> -			*sd_idle = 0;
> -
>  		/* Bias balancing toward cpus of our domain */
>  		if (local_group) {
>  			if (idle_cpu(i) && !first_idle_cpu) {
> @@ -2755,15 +2751,13 @@ static bool update_sd_pick_busiest(struct sched_domain *sd,
>   * @sd: sched_domain whose statistics are to be updated.
>   * @this_cpu: Cpu for which load balance is currently performed.
>   * @idle: Idle status of this_cpu
> - * @sd_idle: Idle status of the sched_domain containing sg.
>   * @cpus: Set of cpus considered for load balancing.
>   * @balance: Should we balance.
>   * @sds: variable to hold the statistics for this sched_domain.
>   */
>  static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
> -			enum cpu_idle_type idle, int *sd_idle,
> -			const struct cpumask *cpus, int *balance,
> -			struct sd_lb_stats *sds)
> +			enum cpu_idle_type idle, const struct cpumask *cpus,
> +			int *balance, struct sd_lb_stats *sds)
>  {
>  	struct sched_domain *child = sd->child;
>  	struct sched_group *sg = sd->groups;
> @@ -2781,7 +2775,7 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu,
> 
>  		local_group = cpumask_test_cpu(this_cpu, sched_group_cpus(sg));
>  		memset(&sgs, 0, sizeof(sgs));
> -		update_sg_lb_stats(sd, sg, this_cpu, idle, load_idx, sd_idle,
> +		update_sg_lb_stats(sd, sg, this_cpu, idle, load_idx,
>  				local_group, cpus, balance, &sgs);
> 
>  		if (local_group && !(*balance))
> @@ -3033,7 +3027,6 @@ static inline void calculate_imbalance(struct sd_lb_stats *sds, int this_cpu,
>   * @imbalance: Variable which stores amount of weighted load which should
>   *		be moved to restore balance/put a group to idle.
>   * @idle: The idle status of this_cpu.
> - * @sd_idle: The idleness of sd
>   * @cpus: The set of CPUs under consideration for load-balancing.
>   * @balance: Pointer to a variable indicating if this_cpu
>   *	is the appropriate cpu to perform load balancing at this_level.
> @@ -3046,7 +3039,7 @@ static inline void calculate_imbalance(struct sd_lb_stats *sds, int this_cpu,
>  static struct sched_group *
>  find_busiest_group(struct sched_domain *sd, int this_cpu,
>  		   unsigned long *imbalance, enum cpu_idle_type idle,
> -		   int *sd_idle, const struct cpumask *cpus, int *balance)
> +		   const struct cpumask *cpus, int *balance)
>  {
>  	struct sd_lb_stats sds;
> 
> @@ -3056,8 +3049,7 @@ find_busiest_group(struct sched_domain *sd, int this_cpu,
>  	 * Compute the various statistics relavent for load balancing at
>  	 * this level.
>  	 */
> -	update_sd_lb_stats(sd, this_cpu, idle, sd_idle, cpus,
> -					balance, &sds);
> +	update_sd_lb_stats(sd, this_cpu, idle, cpus, balance, &sds);
> 
>  	/* Cases where imbalance does not exist from POV of this_cpu */
>  	/* 1) this_cpu is not the appropriate cpu to perform load balancing
> @@ -3193,7 +3185,7 @@ find_busiest_queue(struct sched_domain *sd, struct sched_group *group,
>  /* Working cpumask for load_balance and load_balance_newidle. */
>  static DEFINE_PER_CPU(cpumask_var_t, load_balance_tmpmask);
> 
> -static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle,
> +static int need_active_balance(struct sched_domain *sd, int idle,
>  			       int busiest_cpu, int this_cpu)
>  {
>  	if (idle == CPU_NEWLY_IDLE) {
> @@ -3225,10 +3217,6 @@ static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle,
>  		 * move_tasks() will succeed.  ld_moved will be true and this
>  		 * active balance code will not be triggered.
>  		 */
> -		if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER &&
> -		    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
> -			return 0;
> -

This condition will nack active balancing for semi idle core when
sched_smt_powersavings is not set.  f_b_g() itself should have
returned NULL if there are no power savings opportunity.

>  		if (sched_mc_power_savings < POWERSAVINGS_BALANCE_WAKEUP)
>  			return 0;
>  	}
> @@ -3246,7 +3234,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
>  			struct sched_domain *sd, enum cpu_idle_type idle,
>  			int *balance)
>  {
> -	int ld_moved, all_pinned = 0, active_balance = 0, sd_idle = 0;
> +	int ld_moved, all_pinned = 0, active_balance = 0;
>  	struct sched_group *group;
>  	unsigned long imbalance;
>  	struct rq *busiest;
> @@ -3255,20 +3243,10 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> 
>  	cpumask_copy(cpus, cpu_active_mask);
> 
> -	/*
> -	 * When power savings policy is enabled for the parent domain, idle
> -	 * sibling can pick up load irrespective of busy siblings. In this case,
> -	 * let the state of idle sibling percolate up as CPU_IDLE, instead of
> -	 * portraying it as CPU_NOT_IDLE.
> -	 */
> -	if (idle != CPU_NOT_IDLE && sd->flags & SD_SHARE_CPUPOWER &&
> -	    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
> -		sd_idle = 1;

This is kind of becoming the default now when sd_idle is removed.
When powersave balance is set, we want to run load balancer more
frequently.

> -
>  	schedstat_inc(sd, lb_count[idle]);
> 
>  redo:
> -	group = find_busiest_group(sd, this_cpu, &imbalance, idle, &sd_idle,
> +	group = find_busiest_group(sd, this_cpu, &imbalance, idle,
>  				   cpus, balance);
> 
>  	if (*balance == 0)
> @@ -3330,8 +3308,7 @@ redo:
>  		if (idle != CPU_NEWLY_IDLE)
>  			sd->nr_balance_failed++;
> 
> -		if (need_active_balance(sd, sd_idle, idle, cpu_of(busiest),
> -					this_cpu)) {
> +		if (need_active_balance(sd, idle, cpu_of(busiest), this_cpu)) {
>  			raw_spin_lock_irqsave(&busiest->lock, flags);
> 
>  			/* don't kick the active_load_balance_cpu_stop,
> @@ -3386,10 +3363,6 @@ redo:
>  			sd->balance_interval *= 2;
>  	}
> 
> -	if (!ld_moved && !sd_idle && sd->flags & SD_SHARE_CPUPOWER &&
> -	    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
> -		ld_moved = -1;

I have not figured out where ld_moved is checked for -1 and why we
need to treat this as a special case.

Your bug fix in idle_balance() for if (pulled_task) {...} is a good
catch.

> -
>  	goto out;
> 
>  out_balanced:
> @@ -3403,11 +3376,7 @@ out_one_pinned:
>  			(sd->balance_interval < sd->max_interval))
>  		sd->balance_interval *= 2;
> 
> -	if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER &&
> -	    !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE))
> -		ld_moved = -1;
> -	else
> -		ld_moved = 0;

Ack.  But why did we have to flag this case earlier?

> +	ld_moved = 0;
>  out:
>  	return ld_moved;
>  }

--Vaidy


  reply	other threads:[~2011-02-15 17:02 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-04 20:51 [PATCH] sched: Resolve sd_idle and first_idle_cpu Catch-22 Venkatesh Pallipadi
2011-02-04 21:25 ` [PATCH] sched: Resolve sd_idle and first_idle_cpu Catch-22 - v1 Venkatesh Pallipadi
2011-02-07 13:50   ` Peter Zijlstra
2011-02-07 18:21     ` Venkatesh Pallipadi
2011-02-07 19:53       ` Suresh Siddha
2011-02-08 17:37         ` Venkatesh Pallipadi
2011-02-08 18:13           ` Misc sd_idle related fixes Venkatesh Pallipadi
2011-02-09  9:29             ` Peter Zijlstra
2011-02-10 17:24               ` Venkatesh Pallipadi
2011-02-08 18:13           ` [PATCH 1/3] sched: Resolve sd_idle and first_idle_cpu Catch-22 Venkatesh Pallipadi
2011-02-08 18:13           ` [PATCH 2/3] sched: fix_up broken SMT load balance dilation Venkatesh Pallipadi
2011-02-08 18:13           ` [PATCH 3/3] sched: newidle balance set idle_timestamp only on successful pull Venkatesh Pallipadi
2011-02-09  3:37             ` Mike Galbraith
2011-02-09 15:55         ` [PATCH] sched: Resolve sd_idle and first_idle_cpu Catch-22 - v1 Peter Zijlstra
2011-02-12  1:20           ` Suresh Siddha
2011-02-14 22:38             ` [PATCH] sched: Wholesale removal of sd_idle logic Venkatesh Pallipadi
2011-02-15 17:01               ` Vaidyanathan Srinivasan [this message]
2011-02-15 18:26                 ` Venkatesh Pallipadi
2011-02-16  8:53                   ` Vaidyanathan Srinivasan
2011-02-16 11:43               ` Peter Zijlstra
2011-02-16 13:50               ` [tip:sched/core] " tip-bot for Venkatesh Pallipadi
2011-02-15  9:15             ` [PATCH] sched: Resolve sd_idle and first_idle_cpu Catch-22 - v1 Peter Zijlstra
2011-02-15 19:11               ` Suresh Siddha
2011-02-18  1:05             ` Alex,Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110215170127.GA28865@dirshya.in.ibm.com \
    --to=svaidy@linux.vnet.ibm.com \
    --cc=alex.shi@intel.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=suresh.b.siddha@intel.com \
    --cc=tim.c.chen@intel.com \
    --cc=venki@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).