LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: mingo@elte.hu, linux-kernel@vger.kernel.org, npiggin@suse.de,
	rostedt@goodmis.org
Subject: Re: [patch 2/2] sched: dynticks idle load balancing - v2
Date: Wed, 21 Feb 2007 12:23:44 -0800	[thread overview]
Message-ID: <20070221122344.564fb44f.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070216180842.B8744@unix-os.sc.intel.com>

On Fri, 16 Feb 2007 18:08:42 -0800
"Siddha, Suresh B" <suresh.b.siddha@intel.com> wrote:

> Changes since v1:
> 
>   - Move the idle load balancer selection from schedule()
>     to the first busy scheduler_tick() after restarting the tick.
>     This will avoid the unnecessay ownership changes when 
>     softirq's(which are run in softirqd context in certain -rt
>     configurations) like timer, sched are invoked for every idle tick
>     that happens.
> 
>   - Misc cleanups.
> 
> ---
> Fix the process idle load balancing in the presence of dynticks.
> cpus for which ticks are stopped will sleep till the next event wakes it up.
> Potentially these sleeps can be for large durations and during which today,
> there is no periodic idle load balancing being done.
> 
> This patch nominates an owner among the idle cpus, which does the idle load
> balancing on behalf of the other idle cpus. And once all the cpus are
> completely idle, then we can stop this idle load balancing too. Checks added
> in fast path are minimized.  Whenever there are busy cpus in the system, there
> will be an owner(idle cpu) doing the system wide idle load balancing.
> 
> Open items:
> 1. Intelligent owner selection (like an idle core in a busy package).
> 2. Merge with rcu's nohz_cpu_mask?
> 

I spose I'll hold my nose and merge this, but it creates too much of a mess
to be mergeable into the CPU scheduler, IMO.

Can we please do something to reduce the ifdef density?  And if possible,
all the newly-added returns-from-the-middle-of-a-function?


> +#ifdef CONFIG_NO_HZ
> +static struct {
> +	int load_balancer;
> +	cpumask_t  cpu_mask;
> +} nohz ____cacheline_aligned = {
> +	.load_balancer = -1,
> +	.cpu_mask = CPU_MASK_NONE,
> +};
> +
> +/*
> + * This routine will try to nominate the ilb (idle load balancing)
> + * owner among the cpus whose ticks are stopped. ilb owner will do the idle
> + * load balancing on behalf of all those cpus. If all the cpus in the system
> + * go into this tickless mode, then there will be no ilb owner (as there is
> + * no need for one) and all the cpus will sleep till the next wakeup event
> + * arrives...
> + *
> + * For the ilb owner, tick is not stopped. And this tick will be used
> + * for idle load balancing. ilb owner will still be part of
> + * nohz.cpu_mask..
> + *
> + * While stopping the tick, this cpu will become the ilb owner if there
> + * is no other owner. And will be the owner till that cpu becomes busy
> + * or if all cpus in the system stop their ticks at which point
> + * there is no need for ilb owner.
> + *
> + * When the ilb owner becomes busy, it nominates another owner, during the
> + * next busy scheduler_tick()
> + */
> +int select_nohz_load_balancer(int stop_tick)
> +{
> +	int cpu = smp_processor_id();
> +
> +	if (stop_tick) {
> +		cpu_set(cpu, nohz.cpu_mask);
> +		cpu_rq(cpu)->in_nohz_recently = 1;
> +
> +		/*
> +		 * If we are going offline and still the leader, give up!
> +		 */
> +		if (cpu_is_offline(cpu) && nohz.load_balancer == cpu) {
> +			if (cmpxchg(&nohz.load_balancer,  cpu, -1) != cpu)

So we require that architectures which implement CONFIG_NO_HZ also
implement cmpxchg.

> +				BUG();
> +			return 0;
> +		}
> +
> +		/* time for ilb owner also to sleep */
> +		if (cpus_weight(nohz.cpu_mask) == num_online_cpus()) {
> +			if (nohz.load_balancer == cpu)
> +				nohz.load_balancer = -1;
> +			return 0;
> +		}
> +
> +		if (nohz.load_balancer == -1) {
> +			/* make me the ilb owner */
> +			if (cmpxchg(&nohz.load_balancer, -1, cpu) == -1)
> +				return 1;
> +		} else if (nohz.load_balancer == cpu)
> +			return 1;
> +	} else {
> +		if (!cpu_isset(cpu, nohz.cpu_mask))
> +			return 0;
> +
> +		cpu_clear(cpu, nohz.cpu_mask);
> +
> +		if (nohz.load_balancer == cpu)
> +			if (cmpxchg(&nohz.load_balancer,  cpu, -1) != cpu)
> +				BUG();
> +	}
> +	return 0;
> +}
> +#endif
> +
>  /*
>   * run_rebalance_domains is triggered when needed from the scheduler tick.
>   *
> @@ -3347,15 +3437,46 @@ static DEFINE_SPINLOCK(balancing);
>  
>  static void run_rebalance_domains(struct softirq_action *h)
>  {
> -	int this_cpu = smp_processor_id(), balance = 1;
> -	struct rq *this_rq = cpu_rq(this_cpu);
> -	unsigned long interval;
> +	int balance_cpu = smp_processor_id(), balance;
> +	struct rq *balance_rq = cpu_rq(balance_cpu);
> +	unsigned long interval, next_balance;

One definition per line is preferred.

>  	struct sched_domain *sd;
> -	enum idle_type idle = this_rq->idle_at_tick ? SCHED_IDLE : NOT_IDLE;
> +	enum idle_type idle;
> +
> +#ifdef CONFIG_NO_HZ
> +	cpumask_t cpus = nohz.cpu_mask;
> +	int local_cpu = balance_cpu;
> +	struct rq *local_rq = balance_rq;
> +
> +restart:
> +
> +	if (local_rq->idle_at_tick && nohz.load_balancer == local_cpu) {
> +		if (need_resched())
> +			return;
> +
> +		balance_cpu = first_cpu(cpus);
> +		if (unlikely(balance_cpu >= NR_CPUS))
> +			return;
> +		balance_rq = cpu_rq(balance_cpu);
> +		cpu_clear(balance_cpu, cpus);
> +	}
> +
> +	/*
> +	 * We must be doing idle load balancing for some other idle cpu
> +	 */
> +	if (local_cpu != balance_cpu)
> +		idle = SCHED_IDLE;
> +	else
> +#endif
> +		idle = balance_rq->idle_at_tick ? SCHED_IDLE : NOT_IDLE;
> +

It's not just that this is an eyesore (it is) - it introduces some ongoing
maintenance problems.  I expect this can be pulled out into a separate
function with no loss in quality of generated code.

And that function would be a nice site for a nice comment explaining the
design.  Why do we return if need_resched()?  What is the significance of
(balance_cpu >= NR_CPUS)?


> +
>  	/* Earliest time when we have to call run_rebalance_domains again */
> -	unsigned long next_balance = jiffies + 60*HZ;
> +	next_balance = jiffies + 60*HZ;
> +
> +	for_each_domain(balance_cpu, sd) {
>  
> -	for_each_domain(this_cpu, sd) {
>  		if (!(sd->flags & SD_LOAD_BALANCE))
>  			continue;
>  
> @@ -3374,7 +3495,7 @@ static void run_rebalance_domains(struct
>  		}
>  
>  		if (time_after_eq(jiffies, sd->last_balance + interval)) {
> -			if (load_balance(this_cpu, this_rq, sd, idle, &balance)) {
> +			if (load_balance(balance_cpu, balance_rq, sd, idle, &balance)) {
>  				/*
>  				 * We've pulled tasks over so either we're no
>  				 * longer idle, or one of our SMT siblings is
> @@ -3398,7 +3519,16 @@ out:
>  		if (!balance)
>  			break;
>  	}
> -	this_rq->next_balance = next_balance;
> +	balance_rq->next_balance = next_balance;
> +
> +#ifdef CONFIG_NO_HZ
> +	if (local_rq->idle_at_tick && nohz.load_balancer == local_cpu) {
> +		if (time_after(next_balance, local_rq->next_balance))
> +			local_rq->next_balance = next_balance;
> +	    	if (!cpus_empty(cpus))
> +			goto restart;
> +	}
> +#endif
>  }

A goto in an ifdef :(

Perhaps it can be removed by teaching the caller to call this function again?



  reply	other threads:[~2007-02-21 20:23 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-11 23:53 [RFC] Patch: dynticks: idle load balancing Siddha, Suresh B
2006-12-13 22:43 ` Ingo Molnar
2006-12-13 23:13   ` Ingo Molnar
2006-12-13 23:03     ` Siddha, Suresh B
2006-12-13 23:31       ` Ingo Molnar
2006-12-13 23:19         ` Siddha, Suresh B
2006-12-14  0:34           ` Ingo Molnar
2006-12-19 20:12           ` Ingo Molnar
2006-12-19 21:12             ` Siddha, Suresh B
2007-01-16 11:35               ` Ingo Molnar
2007-01-30 21:57                 ` Siddha, Suresh B
2007-02-07 22:19                   ` Siddha, Suresh B
2007-02-17  2:03                   ` [patch 1/2] sched: fix idle load balancing in softirqd context Siddha, Suresh B
2007-02-17  2:08                     ` [patch 2/2] sched: dynticks idle load balancing - v2 Siddha, Suresh B
2007-02-21 20:23                       ` Andrew Morton [this message]
2007-02-22  3:14                         ` Nick Piggin
2007-02-24  2:01                         ` [patch] sched: dynticks idle load balancing - v3 Siddha, Suresh B
2007-02-22  3:26                       ` [patch 2/2] sched: dynticks idle load balancing - v2 Nick Piggin
2007-02-22 22:33                         ` Siddha, Suresh B
2007-02-23  3:43                           ` Nick Piggin
2007-02-17 14:42                     ` [patch 1/2] sched: fix idle load balancing in softirqd context Steven Rostedt
2007-02-21  6:25                       ` Siddha, Suresh B
2007-02-21 20:13                     ` Andrew Morton
2006-12-13 23:48     ` [RFC] Patch: dynticks: idle load balancing Ingo Molnar
2006-12-20  0:49 ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070221122344.564fb44f.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=npiggin@suse.de \
    --cc=rostedt@goodmis.org \
    --cc=suresh.b.siddha@intel.com \
    --subject='Re: [patch 2/2] sched: dynticks idle load balancing - v2' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).