LKML Archive on lore.kernel.org help / color / mirror / Atom feed
From: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> To: Venkatesh Pallipadi <venki@google.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@elte.hu>, linux-kernel@vger.kernel.org, Paul Turner <pjt@google.com>, Mike Galbraith <efault@gmx.de>, Nick Piggin <npiggin@gmail.com>, Tim Chen <tim.c.chen@intel.com>, Alex Shi <alex.shi@intel.com> Subject: Re: [PATCH] sched: Wholesale removal of sd_idle logic Date: Tue, 15 Feb 2011 22:31:27 +0530 [thread overview] Message-ID: <20110215170127.GA28865@dirshya.in.ibm.com> (raw) In-Reply-To: <1297723130-693-1-git-send-email-venki@google.com> * Venkatesh Pallipadi <venki@google.com> [2011-02-14 14:38:50]: > sd_idle logic was introduced way back in 2005 (commit 5969fe06), > as an HT optimization. > > As per the discussion in the thread here > lkml subject - sched: Resolve sd_idle and first_idle_cpu Catch-22 - v1 > https://patchwork.kernel.org/patch/532501/ > > the capacity based logic in the load balancer right now handles this > in a much cleaner way, handling more than 2 SMT siblings etc, and sd_idle > does not seem to bring any adiitional benefits. sd_idle logic also has > some bugs that has performance impact. Here is the patch that removes > the sd_idle logic altogether. > > The initial patch here - https://patchwork.kernel.org/patch/532501/ > applies cleanly over the below change and provides a micro-optimization > for a specific case, where an idle core can pull tasks instead of a > core with one thread being idle and other thread being busy. > It will be good to get some data on whether this micro-optimization > matters or not. > > Also, there was a dependency of sched_mc_power_savings == 2, with sd_idle > logic. Copying Vaidy to know the impact of this change there. Hi Venki, The dependency is to avoid active balancing when there is a busy sibling and power save balance is not set. Another logic would propagate/force sd_idle=1 to induce more frequent balancing for idle sibling in case of power save balance. Removing sd_idle will make this default. Your changes look good. I will test and report. > Signed-off-by: Venkatesh Pallipadi <venki@google.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> > --- > kernel/sched_fair.c | 53 ++++++++++---------------------------------------- > 1 files changed, 11 insertions(+), 42 deletions(-) > > diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c > index 0c26e2d..932dc13 100644 > --- a/kernel/sched_fair.c > +++ b/kernel/sched_fair.c > @@ -2610,7 +2610,6 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group) > * @this_cpu: Cpu for which load balance is currently performed. > * @idle: Idle status of this_cpu > * @load_idx: Load index of sched_domain of this_cpu for load calc. > - * @sd_idle: Idle status of the sched_domain containing group. > * @local_group: Does group contain this_cpu. > * @cpus: Set of cpus considered for load balancing. > * @balance: Should we balance. > @@ -2618,7 +2617,7 @@ fix_small_capacity(struct sched_domain *sd, struct sched_group *group) > */ > static inline void update_sg_lb_stats(struct sched_domain *sd, > struct sched_group *group, int this_cpu, > - enum cpu_idle_type idle, int load_idx, int *sd_idle, > + enum cpu_idle_type idle, int load_idx, > int local_group, const struct cpumask *cpus, > int *balance, struct sg_lb_stats *sgs) > { > @@ -2638,9 +2637,6 @@ static inline void update_sg_lb_stats(struct sched_domain *sd, > for_each_cpu_and(i, sched_group_cpus(group), cpus) { > struct rq *rq = cpu_rq(i); > > - if (*sd_idle && rq->nr_running) > - *sd_idle = 0; > - > /* Bias balancing toward cpus of our domain */ > if (local_group) { > if (idle_cpu(i) && !first_idle_cpu) { > @@ -2755,15 +2751,13 @@ static bool update_sd_pick_busiest(struct sched_domain *sd, > * @sd: sched_domain whose statistics are to be updated. > * @this_cpu: Cpu for which load balance is currently performed. > * @idle: Idle status of this_cpu > - * @sd_idle: Idle status of the sched_domain containing sg. > * @cpus: Set of cpus considered for load balancing. > * @balance: Should we balance. > * @sds: variable to hold the statistics for this sched_domain. > */ > static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu, > - enum cpu_idle_type idle, int *sd_idle, > - const struct cpumask *cpus, int *balance, > - struct sd_lb_stats *sds) > + enum cpu_idle_type idle, const struct cpumask *cpus, > + int *balance, struct sd_lb_stats *sds) > { > struct sched_domain *child = sd->child; > struct sched_group *sg = sd->groups; > @@ -2781,7 +2775,7 @@ static inline void update_sd_lb_stats(struct sched_domain *sd, int this_cpu, > > local_group = cpumask_test_cpu(this_cpu, sched_group_cpus(sg)); > memset(&sgs, 0, sizeof(sgs)); > - update_sg_lb_stats(sd, sg, this_cpu, idle, load_idx, sd_idle, > + update_sg_lb_stats(sd, sg, this_cpu, idle, load_idx, > local_group, cpus, balance, &sgs); > > if (local_group && !(*balance)) > @@ -3033,7 +3027,6 @@ static inline void calculate_imbalance(struct sd_lb_stats *sds, int this_cpu, > * @imbalance: Variable which stores amount of weighted load which should > * be moved to restore balance/put a group to idle. > * @idle: The idle status of this_cpu. > - * @sd_idle: The idleness of sd > * @cpus: The set of CPUs under consideration for load-balancing. > * @balance: Pointer to a variable indicating if this_cpu > * is the appropriate cpu to perform load balancing at this_level. > @@ -3046,7 +3039,7 @@ static inline void calculate_imbalance(struct sd_lb_stats *sds, int this_cpu, > static struct sched_group * > find_busiest_group(struct sched_domain *sd, int this_cpu, > unsigned long *imbalance, enum cpu_idle_type idle, > - int *sd_idle, const struct cpumask *cpus, int *balance) > + const struct cpumask *cpus, int *balance) > { > struct sd_lb_stats sds; > > @@ -3056,8 +3049,7 @@ find_busiest_group(struct sched_domain *sd, int this_cpu, > * Compute the various statistics relavent for load balancing at > * this level. > */ > - update_sd_lb_stats(sd, this_cpu, idle, sd_idle, cpus, > - balance, &sds); > + update_sd_lb_stats(sd, this_cpu, idle, cpus, balance, &sds); > > /* Cases where imbalance does not exist from POV of this_cpu */ > /* 1) this_cpu is not the appropriate cpu to perform load balancing > @@ -3193,7 +3185,7 @@ find_busiest_queue(struct sched_domain *sd, struct sched_group *group, > /* Working cpumask for load_balance and load_balance_newidle. */ > static DEFINE_PER_CPU(cpumask_var_t, load_balance_tmpmask); > > -static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle, > +static int need_active_balance(struct sched_domain *sd, int idle, > int busiest_cpu, int this_cpu) > { > if (idle == CPU_NEWLY_IDLE) { > @@ -3225,10 +3217,6 @@ static int need_active_balance(struct sched_domain *sd, int sd_idle, int idle, > * move_tasks() will succeed. ld_moved will be true and this > * active balance code will not be triggered. > */ > - if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER && > - !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE)) > - return 0; > - This condition will nack active balancing for semi idle core when sched_smt_powersavings is not set. f_b_g() itself should have returned NULL if there are no power savings opportunity. > if (sched_mc_power_savings < POWERSAVINGS_BALANCE_WAKEUP) > return 0; > } > @@ -3246,7 +3234,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, > struct sched_domain *sd, enum cpu_idle_type idle, > int *balance) > { > - int ld_moved, all_pinned = 0, active_balance = 0, sd_idle = 0; > + int ld_moved, all_pinned = 0, active_balance = 0; > struct sched_group *group; > unsigned long imbalance; > struct rq *busiest; > @@ -3255,20 +3243,10 @@ static int load_balance(int this_cpu, struct rq *this_rq, > > cpumask_copy(cpus, cpu_active_mask); > > - /* > - * When power savings policy is enabled for the parent domain, idle > - * sibling can pick up load irrespective of busy siblings. In this case, > - * let the state of idle sibling percolate up as CPU_IDLE, instead of > - * portraying it as CPU_NOT_IDLE. > - */ > - if (idle != CPU_NOT_IDLE && sd->flags & SD_SHARE_CPUPOWER && > - !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE)) > - sd_idle = 1; This is kind of becoming the default now when sd_idle is removed. When powersave balance is set, we want to run load balancer more frequently. > - > schedstat_inc(sd, lb_count[idle]); > > redo: > - group = find_busiest_group(sd, this_cpu, &imbalance, idle, &sd_idle, > + group = find_busiest_group(sd, this_cpu, &imbalance, idle, > cpus, balance); > > if (*balance == 0) > @@ -3330,8 +3308,7 @@ redo: > if (idle != CPU_NEWLY_IDLE) > sd->nr_balance_failed++; > > - if (need_active_balance(sd, sd_idle, idle, cpu_of(busiest), > - this_cpu)) { > + if (need_active_balance(sd, idle, cpu_of(busiest), this_cpu)) { > raw_spin_lock_irqsave(&busiest->lock, flags); > > /* don't kick the active_load_balance_cpu_stop, > @@ -3386,10 +3363,6 @@ redo: > sd->balance_interval *= 2; > } > > - if (!ld_moved && !sd_idle && sd->flags & SD_SHARE_CPUPOWER && > - !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE)) > - ld_moved = -1; I have not figured out where ld_moved is checked for -1 and why we need to treat this as a special case. Your bug fix in idle_balance() for if (pulled_task) {...} is a good catch. > - > goto out; > > out_balanced: > @@ -3403,11 +3376,7 @@ out_one_pinned: > (sd->balance_interval < sd->max_interval)) > sd->balance_interval *= 2; > > - if (!sd_idle && sd->flags & SD_SHARE_CPUPOWER && > - !test_sd_parent(sd, SD_POWERSAVINGS_BALANCE)) > - ld_moved = -1; > - else > - ld_moved = 0; Ack. But why did we have to flag this case earlier? > + ld_moved = 0; > out: > return ld_moved; > } --Vaidy
next prev parent reply other threads:[~2011-02-15 17:02 UTC|newest] Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top 2011-02-04 20:51 [PATCH] sched: Resolve sd_idle and first_idle_cpu Catch-22 Venkatesh Pallipadi 2011-02-04 21:25 ` [PATCH] sched: Resolve sd_idle and first_idle_cpu Catch-22 - v1 Venkatesh Pallipadi 2011-02-07 13:50 ` Peter Zijlstra 2011-02-07 18:21 ` Venkatesh Pallipadi 2011-02-07 19:53 ` Suresh Siddha 2011-02-08 17:37 ` Venkatesh Pallipadi 2011-02-08 18:13 ` Misc sd_idle related fixes Venkatesh Pallipadi 2011-02-09 9:29 ` Peter Zijlstra 2011-02-10 17:24 ` Venkatesh Pallipadi 2011-02-08 18:13 ` [PATCH 1/3] sched: Resolve sd_idle and first_idle_cpu Catch-22 Venkatesh Pallipadi 2011-02-08 18:13 ` [PATCH 2/3] sched: fix_up broken SMT load balance dilation Venkatesh Pallipadi 2011-02-08 18:13 ` [PATCH 3/3] sched: newidle balance set idle_timestamp only on successful pull Venkatesh Pallipadi 2011-02-09 3:37 ` Mike Galbraith 2011-02-09 15:55 ` [PATCH] sched: Resolve sd_idle and first_idle_cpu Catch-22 - v1 Peter Zijlstra 2011-02-12 1:20 ` Suresh Siddha 2011-02-14 22:38 ` [PATCH] sched: Wholesale removal of sd_idle logic Venkatesh Pallipadi 2011-02-15 17:01 ` Vaidyanathan Srinivasan [this message] 2011-02-15 18:26 ` Venkatesh Pallipadi 2011-02-16 8:53 ` Vaidyanathan Srinivasan 2011-02-16 11:43 ` Peter Zijlstra 2011-02-16 13:50 ` [tip:sched/core] " tip-bot for Venkatesh Pallipadi 2011-02-15 9:15 ` [PATCH] sched: Resolve sd_idle and first_idle_cpu Catch-22 - v1 Peter Zijlstra 2011-02-15 19:11 ` Suresh Siddha 2011-02-18 1:05 ` Alex,Shi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20110215170127.GA28865@dirshya.in.ibm.com \ --to=svaidy@linux.vnet.ibm.com \ --cc=alex.shi@intel.com \ --cc=efault@gmx.de \ --cc=linux-kernel@vger.kernel.org \ --cc=mingo@elte.hu \ --cc=npiggin@gmail.com \ --cc=peterz@infradead.org \ --cc=pjt@google.com \ --cc=suresh.b.siddha@intel.com \ --cc=tim.c.chen@intel.com \ --cc=venki@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).