LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
@ 2015-03-13  7:27 Wanpeng Li
  2015-03-16 12:09 ` Wanpeng Li
  2015-03-16 15:01 ` Ingo Molnar
  0 siblings, 2 replies; 17+ messages in thread
From: Wanpeng Li @ 2015-03-13  7:27 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: Juri Lelli, linux-kernel, Wanpeng Li

I observe that dl task can't be migrated to other cpus during cpu hotplug,
in addition, task may/may not be running again if cpu is added back. The
root cause which I found is that dl task will be throtted and removed from
dl rq after comsuming all budget, which leads to stop task can't pick it up
from dl rq and migrate to other cpus during hotplug.

The method to reproduce:
schedtool -E -t 50000:100000 -e ./test
Actually test is just a simple for loop. Then observe which cpu the test
task is on.
echo 0 > /sys/devices/system/cpu/cpuN/online

This patch adds the dl task migration during cpu hotplug by finding a most
suitable later deadline rq after dl timer fire if current rq is offline,
if fail to find a suitable later deadline rq then fallback to any eligible
online cpu in order that the deadline task will come back to us, and the
push/pull mechanism should then move it around properly.

Suggested-and-acked-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
---
 kernel/sched/deadline.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 5cb5c9c..db457b9 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -492,6 +492,7 @@ static int start_dl_timer(struct sched_dl_entity *dl_se, bool boosted)
 	return hrtimer_active(&dl_se->dl_timer);
 }
 
+static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq);
 /*
  * This is the bandwidth enforcement timer callback. If here, we know
  * a task is not on its dl_rq, since the fact that the timer was running
@@ -537,6 +538,59 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 	update_rq_clock(rq);
 
 	/*
+	 * So if we find that the rq the task was on is no longer
+	 * available, we need to select a new rq.
+	 */
+	if (unlikely(!rq->online)) {
+		struct rq *later_rq = NULL;
+		bool fallback = false;
+
+		later_rq = find_lock_later_rq(p, rq);
+
+		if (!later_rq) {
+			int cpu;
+
+			/*
+			 * If cannot preempt any rq, fallback to pick any
+			 * online cpu.
+			 */
+			fallback = true;
+			cpu = cpumask_any_and(cpu_active_mask,
+						tsk_cpus_allowed(p));
+			if (cpu >= nr_cpu_ids) {
+				if (dl_bandwidth_enabled()) {
+					/*
+					 * Fail to find any suitable cpu.
+					 * The task will never come back!
+					 */
+					WARN_ON(1);
+					goto unlock;
+				} else {
+					/*
+					 * If admission control is disabled we
+					 * try a little harder to let the task
+					 * run.
+					 */
+					cpu = cpumask_any(cpu_active_mask);
+				}
+			}
+			later_rq = cpu_rq(cpu);
+			double_lock_balance(rq, later_rq);
+		}
+
+		deactivate_task(rq, p, 0);
+		set_task_cpu(p, later_rq->cpu);
+		activate_task(later_rq, p, ENQUEUE_REPLENISH);
+
+		if (!fallback)
+			resched_curr(later_rq);
+
+		double_unlock_balance(rq, later_rq);
+
+		goto unlock;
+	}
+
+	/*
 	 * If the throttle happened during sched-out; like:
 	 *
 	 *   schedule()
-- 
1.9.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-13  7:27 [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug Wanpeng Li
@ 2015-03-16 12:09 ` Wanpeng Li
  2015-03-16 15:01 ` Ingo Molnar
  1 sibling, 0 replies; 17+ messages in thread
From: Wanpeng Li @ 2015-03-16 12:09 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Wanpeng Li, Peter Zijlstra, Juri Lelli, linux-kernel

Ping Ingo,
On Fri, Mar 13, 2015 at 03:27:27PM +0800, Wanpeng Li wrote:
>I observe that dl task can't be migrated to other cpus during cpu hotplug,
>in addition, task may/may not be running again if cpu is added back. The
>root cause which I found is that dl task will be throtted and removed from
>dl rq after comsuming all budget, which leads to stop task can't pick it up
>from dl rq and migrate to other cpus during hotplug.
>
>The method to reproduce:
>schedtool -E -t 50000:100000 -e ./test
>Actually test is just a simple for loop. Then observe which cpu the test
>task is on.
>echo 0 > /sys/devices/system/cpu/cpuN/online
>
>This patch adds the dl task migration during cpu hotplug by finding a most
>suitable later deadline rq after dl timer fire if current rq is offline,
>if fail to find a suitable later deadline rq then fallback to any eligible
>online cpu in order that the deadline task will come back to us, and the
>push/pull mechanism should then move it around properly.
>
>Suggested-and-acked-by: Juri Lelli <juri.lelli@arm.com>
>Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>---
> kernel/sched/deadline.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 54 insertions(+)
>
>diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>index 5cb5c9c..db457b9 100644
>--- a/kernel/sched/deadline.c
>+++ b/kernel/sched/deadline.c
>@@ -492,6 +492,7 @@ static int start_dl_timer(struct sched_dl_entity *dl_se, bool boosted)
> 	return hrtimer_active(&dl_se->dl_timer);
> }
> 
>+static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq);
> /*
>  * This is the bandwidth enforcement timer callback. If here, we know
>  * a task is not on its dl_rq, since the fact that the timer was running
>@@ -537,6 +538,59 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
> 	update_rq_clock(rq);
> 
> 	/*
>+	 * So if we find that the rq the task was on is no longer
>+	 * available, we need to select a new rq.
>+	 */
>+	if (unlikely(!rq->online)) {
>+		struct rq *later_rq = NULL;
>+		bool fallback = false;
>+
>+		later_rq = find_lock_later_rq(p, rq);
>+
>+		if (!later_rq) {
>+			int cpu;
>+
>+			/*
>+			 * If cannot preempt any rq, fallback to pick any
>+			 * online cpu.
>+			 */
>+			fallback = true;
>+			cpu = cpumask_any_and(cpu_active_mask,
>+						tsk_cpus_allowed(p));
>+			if (cpu >= nr_cpu_ids) {
>+				if (dl_bandwidth_enabled()) {
>+					/*
>+					 * Fail to find any suitable cpu.
>+					 * The task will never come back!
>+					 */
>+					WARN_ON(1);
>+					goto unlock;
>+				} else {
>+					/*
>+					 * If admission control is disabled we
>+					 * try a little harder to let the task
>+					 * run.
>+					 */
>+					cpu = cpumask_any(cpu_active_mask);
>+				}
>+			}
>+			later_rq = cpu_rq(cpu);
>+			double_lock_balance(rq, later_rq);
>+		}
>+
>+		deactivate_task(rq, p, 0);
>+		set_task_cpu(p, later_rq->cpu);
>+		activate_task(later_rq, p, ENQUEUE_REPLENISH);
>+
>+		if (!fallback)
>+			resched_curr(later_rq);
>+
>+		double_unlock_balance(rq, later_rq);
>+
>+		goto unlock;
>+	}
>+
>+	/*
> 	 * If the throttle happened during sched-out; like:
> 	 *
> 	 *   schedule()
>-- 
>1.9.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-13  7:27 [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug Wanpeng Li
  2015-03-16 12:09 ` Wanpeng Li
@ 2015-03-16 15:01 ` Ingo Molnar
  2015-03-16 23:01   ` Wanpeng Li
  1 sibling, 1 reply; 17+ messages in thread
From: Ingo Molnar @ 2015-03-16 15:01 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, linux-kernel


* Wanpeng Li <wanpeng.li@linux.intel.com> wrote:

> I observe that dl task can't be migrated to other cpus during cpu hotplug,
> in addition, task may/may not be running again if cpu is added back. The
> root cause which I found is that dl task will be throtted and removed from
> dl rq after comsuming all budget, which leads to stop task can't pick it up
> from dl rq and migrate to other cpus during hotplug.
> 
> The method to reproduce:
> schedtool -E -t 50000:100000 -e ./test
> Actually test is just a simple for loop. Then observe which cpu the test
> task is on.
> echo 0 > /sys/devices/system/cpu/cpuN/online
> 
> This patch adds the dl task migration during cpu hotplug by finding a most
> suitable later deadline rq after dl timer fire if current rq is offline,
> if fail to find a suitable later deadline rq then fallback to any eligible
> online cpu in order that the deadline task will come back to us, and the
> push/pull mechanism should then move it around properly.
> 
> Suggested-and-acked-by: Juri Lelli <juri.lelli@arm.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> ---
>  kernel/sched/deadline.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 54 insertions(+)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 5cb5c9c..db457b9 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -492,6 +492,7 @@ static int start_dl_timer(struct sched_dl_entity *dl_se, bool boosted)
>  	return hrtimer_active(&dl_se->dl_timer);
>  }
>  
> +static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq);
>  /*
>   * This is the bandwidth enforcement timer callback. If here, we know
>   * a task is not on its dl_rq, since the fact that the timer was running
> @@ -537,6 +538,59 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
>  	update_rq_clock(rq);
>  
>  	/*
> +	 * So if we find that the rq the task was on is no longer
> +	 * available, we need to select a new rq.
> +	 */
> +	if (unlikely(!rq->online)) {
> +		struct rq *later_rq = NULL;
> +		bool fallback = false;
> +
> +		later_rq = find_lock_later_rq(p, rq);
> +
> +		if (!later_rq) {
> +			int cpu;
> +
> +			/*
> +			 * If cannot preempt any rq, fallback to pick any
> +			 * online cpu.

s/If cannot/If we cannot
s/fallback/fall back

> +			 */
> +			fallback = true;
> +			cpu = cpumask_any_and(cpu_active_mask,
> +						tsk_cpus_allowed(p));

shouldn't be on separate lines - but this is also a sign that the guts 
of this new code should be in a helper function, not inside several 
layers of branches.

> +			if (cpu >= nr_cpu_ids) {
> +				if (dl_bandwidth_enabled()) {
> +					/*
> +					 * Fail to find any suitable cpu.
> +					 * The task will never come back!
> +					 */
> +					WARN_ON(1);

Can this condition happen to users with a non-buggy kernel?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-16 15:01 ` Ingo Molnar
@ 2015-03-16 23:01   ` Wanpeng Li
  2015-03-17  8:06     ` Ingo Molnar
  0 siblings, 1 reply; 17+ messages in thread
From: Wanpeng Li @ 2015-03-16 23:01 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Juri Lelli, linux-kernel, Wanpeng Li

Hi Ingo,
On Mon, Mar 16, 2015 at 04:01:02PM +0100, Ingo Molnar wrote:
>> +
>> +			/*
>> +			 * If cannot preempt any rq, fallback to pick any
>> +			 * online cpu.
>
>s/If cannot/If we cannot
>s/fallback/fall back

Will do.

>
>> +			 */
>> +			fallback = true;
>> +			cpu = cpumask_any_and(cpu_active_mask,
>> +						tsk_cpus_allowed(p));
>
>shouldn't be on separate lines - but this is also a sign that the guts 

Otherwise there is a "WARNING: line over 80 characters".

>of this new code should be in a helper function, not inside several 
>layers of branches.

Do you mean the whole patch should be in a helper function?

>
>> +			if (cpu >= nr_cpu_ids) {
>> +				if (dl_bandwidth_enabled()) {
>> +					/*
>> +					 * Fail to find any suitable cpu.
>> +					 * The task will never come back!
>> +					 */
>> +					WARN_ON(1);
>
>Can this condition happen to users with a non-buggy kernel?

What do you prefer? ;-)

Regards,
Wanpeng Li


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-17  8:06     ` Ingo Molnar
@ 2015-03-17  7:53       ` Wanpeng Li
  2015-03-17  8:13         ` Ingo Molnar
  0 siblings, 1 reply; 17+ messages in thread
From: Wanpeng Li @ 2015-03-17  7:53 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Juri Lelli, linux-kernel, Wanpeng Li

Hi Ingo,
On Tue, Mar 17, 2015 at 09:06:13AM +0100, Ingo Molnar wrote:
>
>* Wanpeng Li <wanpeng.li@linux.intel.com> wrote:
>
>> Hi Ingo,
>> On Mon, Mar 16, 2015 at 04:01:02PM +0100, Ingo Molnar wrote:
>> >> +
>> >> +			/*
>> >> +			 * If cannot preempt any rq, fallback to pick any
>> >> +			 * online cpu.
>> >
>> >s/If cannot/If we cannot
>> >s/fallback/fall back
>> 
>> Will do.
>> 
>> >
>> >> +			 */
>> >> +			fallback = true;
>> >> +			cpu = cpumask_any_and(cpu_active_mask,
>> >> +						tsk_cpus_allowed(p));
>> >
>> >shouldn't be on separate lines - but this is also a sign that the guts 
>> 
>> Otherwise there is a "WARNING: line over 80 characters".
>
>Yes, but did your reaction to that tool's warning improve the code? I 
>don't think so. If do what I suggested and reduce indentation a bit, 
>you'll fix the warning _and_ improve the code. Win-win.

Cool, will do.

>
>> > of this new code should be in a helper function, not inside 
>> > several layers of branches.
>> 
>> Do you mean the whole patch should be in a helper function?
>
>Probably.

Will do.

>
>> >> +			if (cpu >= nr_cpu_ids) {
>> >> +				if (dl_bandwidth_enabled()) {
>> >> +					/*
>> >> +					 * Fail to find any suitable cpu.
>> >> +					 * The task will never come back!
>> >> +					 */
>> >> +					WARN_ON(1);
>> >
>> > Can this condition happen to users with a non-buggy kernel?
>> 
>> What do you prefer? ;-)
>
>That was a yes/no question: can this condition trigger on correctly 
>working kernels?

How about add unlikely() here? 

Regards,
Wanpeng Li 

>
>Thanks,
>
>	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-17  8:13         ` Ingo Molnar
@ 2015-03-17  7:59           ` Wanpeng Li
  2015-03-23  7:25             ` Ingo Molnar
  0 siblings, 1 reply; 17+ messages in thread
From: Wanpeng Li @ 2015-03-17  7:59 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Wanpeng Li, Peter Zijlstra, Juri Lelli, linux-kernel

On Tue, Mar 17, 2015 at 09:13:02AM +0100, Ingo Molnar wrote:
>
>* Wanpeng Li <wanpeng.li@linux.intel.com> wrote:
>
>> Hi Ingo,
>> On Tue, Mar 17, 2015 at 09:06:13AM +0100, Ingo Molnar wrote:
>> >
>> >* Wanpeng Li <wanpeng.li@linux.intel.com> wrote:
>> >
>> >> Hi Ingo,
>> >> On Mon, Mar 16, 2015 at 04:01:02PM +0100, Ingo Molnar wrote:
>> >> >> +
>> >> >> +			/*
>> >> >> +			 * If cannot preempt any rq, fallback to pick any
>> >> >> +			 * online cpu.
>> >> >
>> >> >s/If cannot/If we cannot
>> >> >s/fallback/fall back
>> >> 
>> >> Will do.
>> >> 
>> >> >
>> >> >> +			 */
>> >> >> +			fallback = true;
>> >> >> +			cpu = cpumask_any_and(cpu_active_mask,
>> >> >> +						tsk_cpus_allowed(p));
>> >> >
>> >> >shouldn't be on separate lines - but this is also a sign that the guts 
>> >> 
>> >> Otherwise there is a "WARNING: line over 80 characters".
>> >
>> >Yes, but did your reaction to that tool's warning improve the code? I 
>> >don't think so. If do what I suggested and reduce indentation a bit, 
>> >you'll fix the warning _and_ improve the code. Win-win.
>> 
>> Cool, will do.
>> 
>> >
>> >> > of this new code should be in a helper function, not inside 
>> >> > several layers of branches.
>> >> 
>> >> Do you mean the whole patch should be in a helper function?
>> >
>> >Probably.
>> 
>> Will do.
>> 
>> >
>> >> >> +			if (cpu >= nr_cpu_ids) {
>> >> >> +				if (dl_bandwidth_enabled()) {
>> >> >> +					/*
>> >> >> +					 * Fail to find any suitable cpu.
>> >> >> +					 * The task will never come back!
>> >> >> +					 */
>> >> >> +					WARN_ON(1);
>> >> >
>> >> > Can this condition happen to users with a non-buggy kernel?
>> >> 
>> >> What do you prefer? ;-)
>> >
>> >That was a yes/no question: can this condition trigger on correctly 
>> >working kernels?
>> 
>> How about add unlikely() here? 
>
>Please answer my question: can this condition trigger on correctly 
>working kernels? I think so, but maybe I'm wrong?

I didn't see it happen, I add this by Juri's suggestion, maybe he can 
explain more.

Ping Juri, ;-)

Regards,
Wanpeng Li

>
>Yes/no.
>
>Thanks,
>
>	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-16 23:01   ` Wanpeng Li
@ 2015-03-17  8:06     ` Ingo Molnar
  2015-03-17  7:53       ` Wanpeng Li
  0 siblings, 1 reply; 17+ messages in thread
From: Ingo Molnar @ 2015-03-17  8:06 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Peter Zijlstra, Juri Lelli, linux-kernel


* Wanpeng Li <wanpeng.li@linux.intel.com> wrote:

> Hi Ingo,
> On Mon, Mar 16, 2015 at 04:01:02PM +0100, Ingo Molnar wrote:
> >> +
> >> +			/*
> >> +			 * If cannot preempt any rq, fallback to pick any
> >> +			 * online cpu.
> >
> >s/If cannot/If we cannot
> >s/fallback/fall back
> 
> Will do.
> 
> >
> >> +			 */
> >> +			fallback = true;
> >> +			cpu = cpumask_any_and(cpu_active_mask,
> >> +						tsk_cpus_allowed(p));
> >
> >shouldn't be on separate lines - but this is also a sign that the guts 
> 
> Otherwise there is a "WARNING: line over 80 characters".

Yes, but did your reaction to that tool's warning improve the code? I 
don't think so. If do what I suggested and reduce indentation a bit, 
you'll fix the warning _and_ improve the code. Win-win.

> > of this new code should be in a helper function, not inside 
> > several layers of branches.
> 
> Do you mean the whole patch should be in a helper function?

Probably.

> >> +			if (cpu >= nr_cpu_ids) {
> >> +				if (dl_bandwidth_enabled()) {
> >> +					/*
> >> +					 * Fail to find any suitable cpu.
> >> +					 * The task will never come back!
> >> +					 */
> >> +					WARN_ON(1);
> >
> > Can this condition happen to users with a non-buggy kernel?
> 
> What do you prefer? ;-)

That was a yes/no question: can this condition trigger on correctly 
working kernels?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-17  7:53       ` Wanpeng Li
@ 2015-03-17  8:13         ` Ingo Molnar
  2015-03-17  7:59           ` Wanpeng Li
  0 siblings, 1 reply; 17+ messages in thread
From: Ingo Molnar @ 2015-03-17  8:13 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Peter Zijlstra, Juri Lelli, linux-kernel


* Wanpeng Li <wanpeng.li@linux.intel.com> wrote:

> Hi Ingo,
> On Tue, Mar 17, 2015 at 09:06:13AM +0100, Ingo Molnar wrote:
> >
> >* Wanpeng Li <wanpeng.li@linux.intel.com> wrote:
> >
> >> Hi Ingo,
> >> On Mon, Mar 16, 2015 at 04:01:02PM +0100, Ingo Molnar wrote:
> >> >> +
> >> >> +			/*
> >> >> +			 * If cannot preempt any rq, fallback to pick any
> >> >> +			 * online cpu.
> >> >
> >> >s/If cannot/If we cannot
> >> >s/fallback/fall back
> >> 
> >> Will do.
> >> 
> >> >
> >> >> +			 */
> >> >> +			fallback = true;
> >> >> +			cpu = cpumask_any_and(cpu_active_mask,
> >> >> +						tsk_cpus_allowed(p));
> >> >
> >> >shouldn't be on separate lines - but this is also a sign that the guts 
> >> 
> >> Otherwise there is a "WARNING: line over 80 characters".
> >
> >Yes, but did your reaction to that tool's warning improve the code? I 
> >don't think so. If do what I suggested and reduce indentation a bit, 
> >you'll fix the warning _and_ improve the code. Win-win.
> 
> Cool, will do.
> 
> >
> >> > of this new code should be in a helper function, not inside 
> >> > several layers of branches.
> >> 
> >> Do you mean the whole patch should be in a helper function?
> >
> >Probably.
> 
> Will do.
> 
> >
> >> >> +			if (cpu >= nr_cpu_ids) {
> >> >> +				if (dl_bandwidth_enabled()) {
> >> >> +					/*
> >> >> +					 * Fail to find any suitable cpu.
> >> >> +					 * The task will never come back!
> >> >> +					 */
> >> >> +					WARN_ON(1);
> >> >
> >> > Can this condition happen to users with a non-buggy kernel?
> >> 
> >> What do you prefer? ;-)
> >
> >That was a yes/no question: can this condition trigger on correctly 
> >working kernels?
> 
> How about add unlikely() here? 

Please answer my question: can this condition trigger on correctly 
working kernels? I think so, but maybe I'm wrong?

Yes/no.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-17  7:59           ` Wanpeng Li
@ 2015-03-23  7:25             ` Ingo Molnar
  2015-03-23  8:55               ` Peter Zijlstra
  2015-03-24  9:50               ` Wanpeng Li
  0 siblings, 2 replies; 17+ messages in thread
From: Ingo Molnar @ 2015-03-23  7:25 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Peter Zijlstra, Juri Lelli, linux-kernel


* Wanpeng Li <wanpeng.li@linux.intel.com> wrote:

> On Tue, Mar 17, 2015 at 09:13:02AM +0100, Ingo Molnar wrote:
> >
> >* Wanpeng Li <wanpeng.li@linux.intel.com> wrote:
> >
> >> Hi Ingo,
> >> On Tue, Mar 17, 2015 at 09:06:13AM +0100, Ingo Molnar wrote:
> >> >
> >> >* Wanpeng Li <wanpeng.li@linux.intel.com> wrote:
> >> >
> >> >> Hi Ingo,
> >> >> On Mon, Mar 16, 2015 at 04:01:02PM +0100, Ingo Molnar wrote:
> >> >> >> +
> >> >> >> +			/*
> >> >> >> +			 * If cannot preempt any rq, fallback to pick any
> >> >> >> +			 * online cpu.
> >> >> >
> >> >> >s/If cannot/If we cannot
> >> >> >s/fallback/fall back
> >> >> 
> >> >> Will do.
> >> >> 
> >> >> >
> >> >> >> +			 */
> >> >> >> +			fallback = true;
> >> >> >> +			cpu = cpumask_any_and(cpu_active_mask,
> >> >> >> +						tsk_cpus_allowed(p));
> >> >> >
> >> >> >shouldn't be on separate lines - but this is also a sign that the guts 
> >> >> 
> >> >> Otherwise there is a "WARNING: line over 80 characters".
> >> >
> >> >Yes, but did your reaction to that tool's warning improve the code? I 
> >> >don't think so. If do what I suggested and reduce indentation a bit, 
> >> >you'll fix the warning _and_ improve the code. Win-win.
> >> 
> >> Cool, will do.
> >> 
> >> >
> >> >> > of this new code should be in a helper function, not inside 
> >> >> > several layers of branches.
> >> >> 
> >> >> Do you mean the whole patch should be in a helper function?
> >> >
> >> >Probably.
> >> 
> >> Will do.
> >> 
> >> >
> >> >> >> +			if (cpu >= nr_cpu_ids) {
> >> >> >> +				if (dl_bandwidth_enabled()) {
> >> >> >> +					/*
> >> >> >> +					 * Fail to find any suitable cpu.
> >> >> >> +					 * The task will never come back!
> >> >> >> +					 */
> >> >> >> +					WARN_ON(1);
> >> >> >
> >> >> > Can this condition happen to users with a non-buggy kernel?
> >> >> 
> >> >> What do you prefer? ;-)
> >> >
> >> >That was a yes/no question: can this condition trigger on correctly 
> >> >working kernels?
> >> 
> >> How about add unlikely() here? 
> >
> >Please answer my question: can this condition trigger on correctly 
> >working kernels? I think so, but maybe I'm wrong?
> 
> I didn't see it happen, I add this by Juri's suggestion, maybe he can 
> explain more.
> 
> Ping Juri, ;-)

I still haven't seen a satisfactory answer to this question. Please 
don't resend patches without clearing questions raised during review.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-23  7:25             ` Ingo Molnar
@ 2015-03-23  8:55               ` Peter Zijlstra
  2015-03-24  9:27                 ` Juri Lelli
  2015-03-24  9:50               ` Wanpeng Li
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2015-03-23  8:55 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Wanpeng Li, Juri Lelli, linux-kernel

On Mon, Mar 23, 2015 at 08:25:04AM +0100, Ingo Molnar wrote:
> > >> >> >> +			if (cpu >= nr_cpu_ids) {
> > >> >> >> +				if (dl_bandwidth_enabled()) {
> > >> >> >> +					/*
> > >> >> >> +					 * Fail to find any suitable cpu.
> > >> >> >> +					 * The task will never come back!
> > >> >> >> +					 */
> > >> >> >> +					WARN_ON(1);
> > >> >> >
> > >> >> > Can this condition happen to users with a non-buggy kernel?

> I still haven't seen a satisfactory answer to this question. Please 
> don't resend patches without clearing questions raised during review.

So I had a look on Friday, it _should_ not happen, but it does due to a
second bug Juri is currently chasing down.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-24  9:27                 ` Juri Lelli
@ 2015-03-24  9:13                   ` Wanpeng Li
  2015-03-24 10:00                     ` Juri Lelli
  2015-03-30  9:12                   ` Peter Zijlstra
  1 sibling, 1 reply; 17+ messages in thread
From: Wanpeng Li @ 2015-03-24  9:13 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Peter Zijlstra, Ingo Molnar, Wanpeng Li, linux-kernel, juri.lelli

Hi Juri,
On Tue, Mar 24, 2015 at 09:27:09AM +0000, Juri Lelli wrote:
>Hi,
>
>On 23/03/2015 08:55, Peter Zijlstra wrote:
>> On Mon, Mar 23, 2015 at 08:25:04AM +0100, Ingo Molnar wrote:
>>>>>>>>>> +			if (cpu >= nr_cpu_ids) {
>>>>>>>>>> +				if (dl_bandwidth_enabled()) {
>>>>>>>>>> +					/*
>>>>>>>>>> +					 * Fail to find any suitable cpu.
>>>>>>>>>> +					 * The task will never come back!
>>>>>>>>>> +					 */
>>>>>>>>>> +					WARN_ON(1);
>>>>>>>>>
>>>>>>>>> Can this condition happen to users with a non-buggy kernel?
>> 
>>> I still haven't seen a satisfactory answer to this question. Please 
>>> don't resend patches without clearing questions raised during review.
>> 
>> So I had a look on Friday, it _should_ not happen, but it does due to a
>> second bug Juri is currently chasing down.
>> 
>
>Right, it should not happen. It happens because hotplug operations are
>destructive w.r.t. cpusets. Peter, how about we move the check you put
>in sched_cpu_inactive() to cpuset_cpu_inactive()? This way, if we fail,
>we don't need to destroy/rebuild the domains.

I remember you mentioned that there is a bug through IRC last week, if this 
patch solve it?

Regards,
Wanpeng Li 

>
>Thanks,
>
>- Juri
>
>>From 65e8033e05f8b70116747062d00d5a5c266699fb Mon Sep 17 00:00:00 2001
>From: Juri Lelli <juri.lelli@gmail.com>
>Date: Tue, 24 Mar 2015 07:47:03 +0000
>Subject: [PATCH] sched/core: check for available -dl bandwidth in
> cpuset_cpu_inactive
>
>Signed-off-by: Juri Lelli <juri.lelli@arm.com>
>---
> kernel/sched/core.c | 56 ++++++++++++++++++++++++++---------------------------
> 1 file changed, 28 insertions(+), 28 deletions(-)
>
>diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>index 50927eb..3723ad0 100644
>--- a/kernel/sched/core.c
>+++ b/kernel/sched/core.c
>@@ -5318,36 +5318,13 @@ static int sched_cpu_active(struct notifier_block *nfb,
> static int sched_cpu_inactive(struct notifier_block *nfb,
> 					unsigned long action, void *hcpu)
> {
>-	unsigned long flags;
>-	long cpu = (long)hcpu;
>-	struct dl_bw *dl_b;
>-
> 	switch (action & ~CPU_TASKS_FROZEN) {
> 	case CPU_DOWN_PREPARE:
>-		set_cpu_active(cpu, false);
>-
>-		/* explicitly allow suspend */
>-		if (!(action & CPU_TASKS_FROZEN)) {
>-			bool overflow;
>-			int cpus;
>-
>-			rcu_read_lock_sched();
>-			dl_b = dl_bw_of(cpu);
>-
>-			raw_spin_lock_irqsave(&dl_b->lock, flags);
>-			cpus = dl_bw_cpus(cpu);
>-			overflow = __dl_overflow(dl_b, cpus, 0, 0);
>-			raw_spin_unlock_irqrestore(&dl_b->lock, flags);
>-
>-			rcu_read_unlock_sched();
>-
>-			if (overflow)
>-				return notifier_from_errno(-EBUSY);
>-		}
>+		set_cpu_active((long)hcpu, false);
> 		return NOTIFY_OK;
>+	default:
>+		return NOTIFY_DONE;
> 	}
>-
>-	return NOTIFY_DONE;
> }
> 
> static int __init migration_init(void)
>@@ -7001,7 +6978,6 @@ static int cpuset_cpu_active(struct notifier_block *nfb, unsigned long action,
> 		 */
> 
> 	case CPU_ONLINE:
>-	case CPU_DOWN_FAILED:
> 		cpuset_update_active_cpus(true);
> 		break;
> 	default:
>@@ -7013,8 +6989,32 @@ static int cpuset_cpu_active(struct notifier_block *nfb, unsigned long action,
> static int cpuset_cpu_inactive(struct notifier_block *nfb, unsigned long action,
> 			       void *hcpu)
> {
>-	switch (action) {
>+	unsigned long flags;
>+	long cpu = (long)hcpu;
>+	struct dl_bw *dl_b;
>+
>+	switch (action & ~CPU_TASKS_FROZEN) {
> 	case CPU_DOWN_PREPARE:
>+		/* explicitly allow suspend */
>+		if (!(action & CPU_TASKS_FROZEN)) {
>+			bool overflow;
>+			int cpus;
>+
>+			rcu_read_lock_sched();
>+			dl_b = dl_bw_of(cpu);
>+
>+			raw_spin_lock_irqsave(&dl_b->lock, flags);
>+			cpus = dl_bw_cpus(cpu);
>+			overflow = __dl_overflow(dl_b, cpus, 0, 0);
>+			raw_spin_unlock_irqrestore(&dl_b->lock, flags);
>+
>+			rcu_read_unlock_sched();
>+
>+			if (overflow) {
>+				trace_printk("hotplug failed for cpu %lu", cpu);
>+				return notifier_from_errno(-EBUSY);
>+			}
>+		}
> 		cpuset_update_active_cpus(false);
> 		break;
> 	case CPU_DOWN_PREPARE_FROZEN:
>-- 
>2.3.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-23  8:55               ` Peter Zijlstra
@ 2015-03-24  9:27                 ` Juri Lelli
  2015-03-24  9:13                   ` Wanpeng Li
  2015-03-30  9:12                   ` Peter Zijlstra
  0 siblings, 2 replies; 17+ messages in thread
From: Juri Lelli @ 2015-03-24  9:27 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Wanpeng Li, linux-kernel, juri.lelli

Hi,

On 23/03/2015 08:55, Peter Zijlstra wrote:
> On Mon, Mar 23, 2015 at 08:25:04AM +0100, Ingo Molnar wrote:
>>>>>>>>> +			if (cpu >= nr_cpu_ids) {
>>>>>>>>> +				if (dl_bandwidth_enabled()) {
>>>>>>>>> +					/*
>>>>>>>>> +					 * Fail to find any suitable cpu.
>>>>>>>>> +					 * The task will never come back!
>>>>>>>>> +					 */
>>>>>>>>> +					WARN_ON(1);
>>>>>>>>
>>>>>>>> Can this condition happen to users with a non-buggy kernel?
> 
>> I still haven't seen a satisfactory answer to this question. Please 
>> don't resend patches without clearing questions raised during review.
> 
> So I had a look on Friday, it _should_ not happen, but it does due to a
> second bug Juri is currently chasing down.
> 

Right, it should not happen. It happens because hotplug operations are
destructive w.r.t. cpusets. Peter, how about we move the check you put
in sched_cpu_inactive() to cpuset_cpu_inactive()? This way, if we fail,
we don't need to destroy/rebuild the domains.

Thanks,

- Juri

>From 65e8033e05f8b70116747062d00d5a5c266699fb Mon Sep 17 00:00:00 2001
From: Juri Lelli <juri.lelli@gmail.com>
Date: Tue, 24 Mar 2015 07:47:03 +0000
Subject: [PATCH] sched/core: check for available -dl bandwidth in
 cpuset_cpu_inactive

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
---
 kernel/sched/core.c | 56 ++++++++++++++++++++++++++---------------------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 50927eb..3723ad0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5318,36 +5318,13 @@ static int sched_cpu_active(struct notifier_block *nfb,
 static int sched_cpu_inactive(struct notifier_block *nfb,
 					unsigned long action, void *hcpu)
 {
-	unsigned long flags;
-	long cpu = (long)hcpu;
-	struct dl_bw *dl_b;
-
 	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_DOWN_PREPARE:
-		set_cpu_active(cpu, false);
-
-		/* explicitly allow suspend */
-		if (!(action & CPU_TASKS_FROZEN)) {
-			bool overflow;
-			int cpus;
-
-			rcu_read_lock_sched();
-			dl_b = dl_bw_of(cpu);
-
-			raw_spin_lock_irqsave(&dl_b->lock, flags);
-			cpus = dl_bw_cpus(cpu);
-			overflow = __dl_overflow(dl_b, cpus, 0, 0);
-			raw_spin_unlock_irqrestore(&dl_b->lock, flags);
-
-			rcu_read_unlock_sched();
-
-			if (overflow)
-				return notifier_from_errno(-EBUSY);
-		}
+		set_cpu_active((long)hcpu, false);
 		return NOTIFY_OK;
+	default:
+		return NOTIFY_DONE;
 	}
-
-	return NOTIFY_DONE;
 }
 
 static int __init migration_init(void)
@@ -7001,7 +6978,6 @@ static int cpuset_cpu_active(struct notifier_block *nfb, unsigned long action,
 		 */
 
 	case CPU_ONLINE:
-	case CPU_DOWN_FAILED:
 		cpuset_update_active_cpus(true);
 		break;
 	default:
@@ -7013,8 +6989,32 @@ static int cpuset_cpu_active(struct notifier_block *nfb, unsigned long action,
 static int cpuset_cpu_inactive(struct notifier_block *nfb, unsigned long action,
 			       void *hcpu)
 {
-	switch (action) {
+	unsigned long flags;
+	long cpu = (long)hcpu;
+	struct dl_bw *dl_b;
+
+	switch (action & ~CPU_TASKS_FROZEN) {
 	case CPU_DOWN_PREPARE:
+		/* explicitly allow suspend */
+		if (!(action & CPU_TASKS_FROZEN)) {
+			bool overflow;
+			int cpus;
+
+			rcu_read_lock_sched();
+			dl_b = dl_bw_of(cpu);
+
+			raw_spin_lock_irqsave(&dl_b->lock, flags);
+			cpus = dl_bw_cpus(cpu);
+			overflow = __dl_overflow(dl_b, cpus, 0, 0);
+			raw_spin_unlock_irqrestore(&dl_b->lock, flags);
+
+			rcu_read_unlock_sched();
+
+			if (overflow) {
+				trace_printk("hotplug failed for cpu %lu", cpu);
+				return notifier_from_errno(-EBUSY);
+			}
+		}
 		cpuset_update_active_cpus(false);
 		break;
 	case CPU_DOWN_PREPARE_FROZEN:
-- 
2.3.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-24 10:00                     ` Juri Lelli
@ 2015-03-24  9:43                       ` Wanpeng Li
  0 siblings, 0 replies; 17+ messages in thread
From: Wanpeng Li @ 2015-03-24  9:43 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, juri.lelli, Wanpeng Li

On Tue, Mar 24, 2015 at 10:00:25AM +0000, Juri Lelli wrote:
>On 24/03/15 09:13, Wanpeng Li wrote:
>> Hi Juri,
>> On Tue, Mar 24, 2015 at 09:27:09AM +0000, Juri Lelli wrote:
>>> Hi,
>>>
>>> On 23/03/2015 08:55, Peter Zijlstra wrote:
>>>> On Mon, Mar 23, 2015 at 08:25:04AM +0100, Ingo Molnar wrote:
>>>>>>>>>>>> +			if (cpu >= nr_cpu_ids) {
>>>>>>>>>>>> +				if (dl_bandwidth_enabled()) {
>>>>>>>>>>>> +					/*
>>>>>>>>>>>> +					 * Fail to find any suitable cpu.
>>>>>>>>>>>> +					 * The task will never come back!
>>>>>>>>>>>> +					 */
>>>>>>>>>>>> +					WARN_ON(1);
>>>>>>>>>>>
>>>>>>>>>>> Can this condition happen to users with a non-buggy kernel?
>>>>
>>>>> I still haven't seen a satisfactory answer to this question. Please 
>>>>> don't resend patches without clearing questions raised during review.
>>>>
>>>> So I had a look on Friday, it _should_ not happen, but it does due to a
>>>> second bug Juri is currently chasing down.
>>>>
>>>
>>> Right, it should not happen. It happens because hotplug operations are
>>> destructive w.r.t. cpusets. Peter, how about we move the check you put
>>> in sched_cpu_inactive() to cpuset_cpu_inactive()? This way, if we fail,
>>> we don't need to destroy/rebuild the domains.
>> 
>> I remember you mentioned that there is a bug through IRC last week, if this 
>> patch solve it?
>>
>
>It seems to fix it. With the previous check we correctly fail to turn
>off a cpu with -dl task running only the first time. After that the
>bandwidth information associated with it was gone and subsequent hotplug
>operations on the same cpu would turn it off.

Cool, thanks for your patch and efforts. ;-)

Regards,
Wanpeng Li 

>
>Thanks,
>
>- Juri
>
>> Regards,
>> Wanpeng Li 
>> 
>>>
>>> Thanks,
>>>
>>> - Juri
>>>
>>> >From 65e8033e05f8b70116747062d00d5a5c266699fb Mon Sep 17 00:00:00 2001
>>> From: Juri Lelli <juri.lelli@gmail.com>
>>> Date: Tue, 24 Mar 2015 07:47:03 +0000
>>> Subject: [PATCH] sched/core: check for available -dl bandwidth in
>>> cpuset_cpu_inactive
>>>
>>> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
>>> ---
>>> kernel/sched/core.c | 56 ++++++++++++++++++++++++++---------------------------
>>> 1 file changed, 28 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 50927eb..3723ad0 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -5318,36 +5318,13 @@ static int sched_cpu_active(struct notifier_block *nfb,
>>> static int sched_cpu_inactive(struct notifier_block *nfb,
>>> 					unsigned long action, void *hcpu)
>>> {
>>> -	unsigned long flags;
>>> -	long cpu = (long)hcpu;
>>> -	struct dl_bw *dl_b;
>>> -
>>> 	switch (action & ~CPU_TASKS_FROZEN) {
>>> 	case CPU_DOWN_PREPARE:
>>> -		set_cpu_active(cpu, false);
>>> -
>>> -		/* explicitly allow suspend */
>>> -		if (!(action & CPU_TASKS_FROZEN)) {
>>> -			bool overflow;
>>> -			int cpus;
>>> -
>>> -			rcu_read_lock_sched();
>>> -			dl_b = dl_bw_of(cpu);
>>> -
>>> -			raw_spin_lock_irqsave(&dl_b->lock, flags);
>>> -			cpus = dl_bw_cpus(cpu);
>>> -			overflow = __dl_overflow(dl_b, cpus, 0, 0);
>>> -			raw_spin_unlock_irqrestore(&dl_b->lock, flags);
>>> -
>>> -			rcu_read_unlock_sched();
>>> -
>>> -			if (overflow)
>>> -				return notifier_from_errno(-EBUSY);
>>> -		}
>>> +		set_cpu_active((long)hcpu, false);
>>> 		return NOTIFY_OK;
>>> +	default:
>>> +		return NOTIFY_DONE;
>>> 	}
>>> -
>>> -	return NOTIFY_DONE;
>>> }
>>>
>>> static int __init migration_init(void)
>>> @@ -7001,7 +6978,6 @@ static int cpuset_cpu_active(struct notifier_block *nfb, unsigned long action,
>>> 		 */
>>>
>>> 	case CPU_ONLINE:
>>> -	case CPU_DOWN_FAILED:
>>> 		cpuset_update_active_cpus(true);
>>> 		break;
>>> 	default:
>>> @@ -7013,8 +6989,32 @@ static int cpuset_cpu_active(struct notifier_block *nfb, unsigned long action,
>>> static int cpuset_cpu_inactive(struct notifier_block *nfb, unsigned long action,
>>> 			       void *hcpu)
>>> {
>>> -	switch (action) {
>>> +	unsigned long flags;
>>> +	long cpu = (long)hcpu;
>>> +	struct dl_bw *dl_b;
>>> +
>>> +	switch (action & ~CPU_TASKS_FROZEN) {
>>> 	case CPU_DOWN_PREPARE:
>>> +		/* explicitly allow suspend */
>>> +		if (!(action & CPU_TASKS_FROZEN)) {
>>> +			bool overflow;
>>> +			int cpus;
>>> +
>>> +			rcu_read_lock_sched();
>>> +			dl_b = dl_bw_of(cpu);
>>> +
>>> +			raw_spin_lock_irqsave(&dl_b->lock, flags);
>>> +			cpus = dl_bw_cpus(cpu);
>>> +			overflow = __dl_overflow(dl_b, cpus, 0, 0);
>>> +			raw_spin_unlock_irqrestore(&dl_b->lock, flags);
>>> +
>>> +			rcu_read_unlock_sched();
>>> +
>>> +			if (overflow) {
>>> +				trace_printk("hotplug failed for cpu %lu", cpu);
>>> +				return notifier_from_errno(-EBUSY);
>>> +			}
>>> +		}
>>> 		cpuset_update_active_cpus(false);
>>> 		break;
>>> 	case CPU_DOWN_PREPARE_FROZEN:
>>> -- 
>>> 2.3.0
>> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-23  7:25             ` Ingo Molnar
  2015-03-23  8:55               ` Peter Zijlstra
@ 2015-03-24  9:50               ` Wanpeng Li
  1 sibling, 0 replies; 17+ messages in thread
From: Wanpeng Li @ 2015-03-24  9:50 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Juri Lelli, linux-kernel, Wanpeng Li

On Mon, Mar 23, 2015 at 08:25:04AM +0100, Ingo Molnar wrote:
>I still haven't seen a satisfactory answer to this question. Please 
>don't resend patches without clearing questions raised during review.

Got it, btw, I think the v12 is the ready version. ;-)

Regards,
Wanpeng Li 

>
>Thanks,
>
>	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-24  9:13                   ` Wanpeng Li
@ 2015-03-24 10:00                     ` Juri Lelli
  2015-03-24  9:43                       ` Wanpeng Li
  0 siblings, 1 reply; 17+ messages in thread
From: Juri Lelli @ 2015-03-24 10:00 UTC (permalink / raw)
  To: Wanpeng Li; +Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, juri.lelli

On 24/03/15 09:13, Wanpeng Li wrote:
> Hi Juri,
> On Tue, Mar 24, 2015 at 09:27:09AM +0000, Juri Lelli wrote:
>> Hi,
>>
>> On 23/03/2015 08:55, Peter Zijlstra wrote:
>>> On Mon, Mar 23, 2015 at 08:25:04AM +0100, Ingo Molnar wrote:
>>>>>>>>>>> +			if (cpu >= nr_cpu_ids) {
>>>>>>>>>>> +				if (dl_bandwidth_enabled()) {
>>>>>>>>>>> +					/*
>>>>>>>>>>> +					 * Fail to find any suitable cpu.
>>>>>>>>>>> +					 * The task will never come back!
>>>>>>>>>>> +					 */
>>>>>>>>>>> +					WARN_ON(1);
>>>>>>>>>>
>>>>>>>>>> Can this condition happen to users with a non-buggy kernel?
>>>
>>>> I still haven't seen a satisfactory answer to this question. Please 
>>>> don't resend patches without clearing questions raised during review.
>>>
>>> So I had a look on Friday, it _should_ not happen, but it does due to a
>>> second bug Juri is currently chasing down.
>>>
>>
>> Right, it should not happen. It happens because hotplug operations are
>> destructive w.r.t. cpusets. Peter, how about we move the check you put
>> in sched_cpu_inactive() to cpuset_cpu_inactive()? This way, if we fail,
>> we don't need to destroy/rebuild the domains.
> 
> I remember you mentioned that there is a bug through IRC last week, if this 
> patch solve it?
>

It seems to fix it. With the previous check we correctly fail to turn
off a cpu with -dl task running only the first time. After that the
bandwidth information associated with it was gone and subsequent hotplug
operations on the same cpu would turn it off.

Thanks,

- Juri

> Regards,
> Wanpeng Li 
> 
>>
>> Thanks,
>>
>> - Juri
>>
>> >From 65e8033e05f8b70116747062d00d5a5c266699fb Mon Sep 17 00:00:00 2001
>> From: Juri Lelli <juri.lelli@gmail.com>
>> Date: Tue, 24 Mar 2015 07:47:03 +0000
>> Subject: [PATCH] sched/core: check for available -dl bandwidth in
>> cpuset_cpu_inactive
>>
>> Signed-off-by: Juri Lelli <juri.lelli@arm.com>
>> ---
>> kernel/sched/core.c | 56 ++++++++++++++++++++++++++---------------------------
>> 1 file changed, 28 insertions(+), 28 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 50927eb..3723ad0 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5318,36 +5318,13 @@ static int sched_cpu_active(struct notifier_block *nfb,
>> static int sched_cpu_inactive(struct notifier_block *nfb,
>> 					unsigned long action, void *hcpu)
>> {
>> -	unsigned long flags;
>> -	long cpu = (long)hcpu;
>> -	struct dl_bw *dl_b;
>> -
>> 	switch (action & ~CPU_TASKS_FROZEN) {
>> 	case CPU_DOWN_PREPARE:
>> -		set_cpu_active(cpu, false);
>> -
>> -		/* explicitly allow suspend */
>> -		if (!(action & CPU_TASKS_FROZEN)) {
>> -			bool overflow;
>> -			int cpus;
>> -
>> -			rcu_read_lock_sched();
>> -			dl_b = dl_bw_of(cpu);
>> -
>> -			raw_spin_lock_irqsave(&dl_b->lock, flags);
>> -			cpus = dl_bw_cpus(cpu);
>> -			overflow = __dl_overflow(dl_b, cpus, 0, 0);
>> -			raw_spin_unlock_irqrestore(&dl_b->lock, flags);
>> -
>> -			rcu_read_unlock_sched();
>> -
>> -			if (overflow)
>> -				return notifier_from_errno(-EBUSY);
>> -		}
>> +		set_cpu_active((long)hcpu, false);
>> 		return NOTIFY_OK;
>> +	default:
>> +		return NOTIFY_DONE;
>> 	}
>> -
>> -	return NOTIFY_DONE;
>> }
>>
>> static int __init migration_init(void)
>> @@ -7001,7 +6978,6 @@ static int cpuset_cpu_active(struct notifier_block *nfb, unsigned long action,
>> 		 */
>>
>> 	case CPU_ONLINE:
>> -	case CPU_DOWN_FAILED:
>> 		cpuset_update_active_cpus(true);
>> 		break;
>> 	default:
>> @@ -7013,8 +6989,32 @@ static int cpuset_cpu_active(struct notifier_block *nfb, unsigned long action,
>> static int cpuset_cpu_inactive(struct notifier_block *nfb, unsigned long action,
>> 			       void *hcpu)
>> {
>> -	switch (action) {
>> +	unsigned long flags;
>> +	long cpu = (long)hcpu;
>> +	struct dl_bw *dl_b;
>> +
>> +	switch (action & ~CPU_TASKS_FROZEN) {
>> 	case CPU_DOWN_PREPARE:
>> +		/* explicitly allow suspend */
>> +		if (!(action & CPU_TASKS_FROZEN)) {
>> +			bool overflow;
>> +			int cpus;
>> +
>> +			rcu_read_lock_sched();
>> +			dl_b = dl_bw_of(cpu);
>> +
>> +			raw_spin_lock_irqsave(&dl_b->lock, flags);
>> +			cpus = dl_bw_cpus(cpu);
>> +			overflow = __dl_overflow(dl_b, cpus, 0, 0);
>> +			raw_spin_unlock_irqrestore(&dl_b->lock, flags);
>> +
>> +			rcu_read_unlock_sched();
>> +
>> +			if (overflow) {
>> +				trace_printk("hotplug failed for cpu %lu", cpu);
>> +				return notifier_from_errno(-EBUSY);
>> +			}
>> +		}
>> 		cpuset_update_active_cpus(false);
>> 		break;
>> 	case CPU_DOWN_PREPARE_FROZEN:
>> -- 
>> 2.3.0
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-24  9:27                 ` Juri Lelli
  2015-03-24  9:13                   ` Wanpeng Li
@ 2015-03-30  9:12                   ` Peter Zijlstra
  2015-03-31  8:55                     ` Juri Lelli
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2015-03-30  9:12 UTC (permalink / raw)
  To: Juri Lelli; +Cc: Ingo Molnar, Wanpeng Li, linux-kernel, juri.lelli

On Tue, Mar 24, 2015 at 09:27:09AM +0000, Juri Lelli wrote:
> Right, it should not happen. It happens because hotplug operations are
> destructive w.r.t. cpusets. Peter, how about we move the check you put
> in sched_cpu_inactive() to cpuset_cpu_inactive()? This way, if we fail,
> we don't need to destroy/rebuild the domains.

Seems ok to my Monday brain, could you shoot a coherent Changelog this
way so that I can copy/paste it into the patch?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug
  2015-03-30  9:12                   ` Peter Zijlstra
@ 2015-03-31  8:55                     ` Juri Lelli
  0 siblings, 0 replies; 17+ messages in thread
From: Juri Lelli @ 2015-03-31  8:55 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Wanpeng Li, linux-kernel, juri.lelli

Hi Peter,

On 30/03/15 10:12, Peter Zijlstra wrote:
> On Tue, Mar 24, 2015 at 09:27:09AM +0000, Juri Lelli wrote:
>> Right, it should not happen. It happens because hotplug operations are
>> destructive w.r.t. cpusets. Peter, how about we move the check you put
>> in sched_cpu_inactive() to cpuset_cpu_inactive()? This way, if we fail,
>> we don't need to destroy/rebuild the domains.
> 
> Seems ok to my Monday brain, could you shoot a coherent Changelog this
> way so that I can copy/paste it into the patch?
> 

Sent it out as an updated patch.

Thanks,

- Juri


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-03-31  8:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-13  7:27 [PATCH RESEND v10] sched/deadline: support dl task migration during cpu hotplug Wanpeng Li
2015-03-16 12:09 ` Wanpeng Li
2015-03-16 15:01 ` Ingo Molnar
2015-03-16 23:01   ` Wanpeng Li
2015-03-17  8:06     ` Ingo Molnar
2015-03-17  7:53       ` Wanpeng Li
2015-03-17  8:13         ` Ingo Molnar
2015-03-17  7:59           ` Wanpeng Li
2015-03-23  7:25             ` Ingo Molnar
2015-03-23  8:55               ` Peter Zijlstra
2015-03-24  9:27                 ` Juri Lelli
2015-03-24  9:13                   ` Wanpeng Li
2015-03-24 10:00                     ` Juri Lelli
2015-03-24  9:43                       ` Wanpeng Li
2015-03-30  9:12                   ` Peter Zijlstra
2015-03-31  8:55                     ` Juri Lelli
2015-03-24  9:50               ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).