LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
@ 2014-12-08  3:27 Anton Blanchard
  2014-12-08  4:28 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Anton Blanchard @ 2014-12-08  3:27 UTC (permalink / raw)
  To: torvalds, akpm, peterz, tglx, mingo, rostedt, tj, fengguang.wu,
	rafael.j.wysocki, yuyang.du, lkp, yuanhan.liu, pjt, bsegall,
	daniel, subbaram, computersforpeace, sp
  Cc: linux-kernel, linuxppc-dev

I have a busy ppc64le KVM box where guests sometimes hit the infamous
"kernel BUG at kernel/smpboot.c:134!" issue during boot:

BUG_ON(td->cpu != smp_processor_id());

Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
output confirms it:

CPU: 0
Comm: watchdog/130

The issue is in kthread_bind where we set the cpus_allowed mask, but do
not touch task_thread_info(p)->cpu. The scheduler assumes the previously
scheduled CPU is in the cpus_allowed mask, but in this case we are
moving a thread to another CPU so it is not.

We used to call set_task_cpu which sets task_thread_info(p)->cpu (in fact
kthread_bind still has a comment suggesting this). That was removed in
e2912009fb7b ("sched: Ensure set_task_cpu() is never called on blocked
tasks").

Since we cannot call set_task_cpu (the task is in a sleeping state),
just do an explicit set of task_thread_info(p)->cpu.

Fixes: e2912009fb7b ("sched: Ensure set_task_cpu() is never called on blocked tasks")
Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
---
 kernel/kthread.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/kthread.c b/kernel/kthread.c
index 10e489c..e40ab1d 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -327,13 +327,14 @@ EXPORT_SYMBOL(kthread_create_on_node);
 
 static void __kthread_bind(struct task_struct *p, unsigned int cpu, long state)
 {
-	/* Must have done schedule() in kthread() before we set_task_cpu */
+	/* Must have done schedule() in kthread() before we change affinity */
 	if (!wait_task_inactive(p, state)) {
 		WARN_ON(1);
 		return;
 	}
 	/* It's safe because the task is inactive. */
 	do_set_cpus_allowed(p, cpumask_of(cpu));
+	task_thread_info(p)->cpu = cpu;
 	p->flags |= PF_NO_SETAFFINITY;
 }
 
-- 
2.1.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
  2014-12-08  3:27 [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!) Anton Blanchard
@ 2014-12-08  4:28 ` Linus Torvalds
  2014-12-08  4:46   ` Anton Blanchard
  2014-12-08  8:34 ` Ingo Molnar
  2014-12-08 13:54 ` [PATCH] kthread: kthread_bind fails to enforce CPU affinity " Steven Rostedt
  2 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2014-12-08  4:28 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Andrew Morton, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Steven Rostedt, Tejun Heo, Wu Fengguang, Rafael J. Wysocki,
	yuyang.du, lkp, Yuanhan Liu, Paul Turner, Benjamin Segall,
	Daniel J Blueman, subbaram, Brian Norris, sp,
	Linux Kernel Mailing List, ppc-dev

On Sun, Dec 7, 2014 at 7:27 PM, Anton Blanchard <anton@samba.org> wrote:
>
> Since we cannot call set_task_cpu (the task is in a sleeping state),
> just do an explicit set of task_thread_info(p)->cpu.

Scheduler people: is this sufficient and ok?

The __set_task_cpu() function does various other things too:

        set_task_rq(p, cpu);
  #ifdef CONFIG_SMP
        /*
         * After ->cpu is set up to a new value, task_rq_lock(p, ...) can be
         * successfuly executed on another CPU. We must ensure that updates of
         * per-task data have been completed by this moment.
         */
        smp_wmb();
        task_thread_info(p)->cpu = cpu;
        p->wake_cpu = cpu;
  #endif

which makes me worry about just setting the thread_info->cpu value.
That set_task_rq() initializes various group scheduling things, an
dthat whole "wake_cpu" thing seems relevant too.

I'm not saying the patch is wrong, I just want confirmation/ack for
it. Although to be honest, I'd rather see this come in through the
scheduler tree anyway.

Hmm? This seems to go back to 2.6.33 if that "Fixes" line is accurate,
so it's been around forever (almost exactly five years). Comments?

                    Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
  2014-12-08  4:28 ` Linus Torvalds
@ 2014-12-08  4:46   ` Anton Blanchard
  0 siblings, 0 replies; 11+ messages in thread
From: Anton Blanchard @ 2014-12-08  4:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	Steven Rostedt, Tejun Heo, Wu Fengguang, Rafael J. Wysocki,
	yuyang.du, lkp, Yuanhan Liu, Paul Turner, Benjamin Segall,
	Daniel J Blueman, subbaram, Brian Norris, sp,
	Linux Kernel Mailing List, ppc-dev


Hi Linus,

> The __set_task_cpu() function does various other things too:
> 
>         set_task_rq(p, cpu);
>   #ifdef CONFIG_SMP
>         /*
>          * After ->cpu is set up to a new value, task_rq_lock(p, ...)
> can be
>          * successfuly executed on another CPU. We must ensure that
> updates of
>          * per-task data have been completed by this moment.
>          */
>         smp_wmb();
>         task_thread_info(p)->cpu = cpu;
>         p->wake_cpu = cpu;
>   #endif
> 
> which makes me worry about just setting the thread_info->cpu value.
> That set_task_rq() initializes various group scheduling things, an
> dthat whole "wake_cpu" thing seems relevant too.

Yeah, I would definitely like the scheduler guys to weigh in on this,
especially considering how difficult it can be to hit.

Anton

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
  2014-12-08  3:27 [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!) Anton Blanchard
  2014-12-08  4:28 ` Linus Torvalds
@ 2014-12-08  8:34 ` Ingo Molnar
  2014-12-08 10:18   ` Anton Blanchard
  2014-12-08 13:54 ` [PATCH] kthread: kthread_bind fails to enforce CPU affinity " Steven Rostedt
  2 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2014-12-08  8:34 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: torvalds, akpm, peterz, tglx, mingo, rostedt, tj, fengguang.wu,
	rafael.j.wysocki, yuyang.du, lkp, yuanhan.liu, pjt, bsegall,
	daniel, subbaram, computersforpeace, sp, linux-kernel,
	linuxppc-dev


* Anton Blanchard <anton@samba.org> wrote:

> I have a busy ppc64le KVM box where guests sometimes hit the 
> infamous "kernel BUG at kernel/smpboot.c:134!" issue during 
> boot:
> 
> BUG_ON(td->cpu != smp_processor_id());
> 
> Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
> output confirms it:
> 
> CPU: 0
> Comm: watchdog/130
> 
> The issue is in kthread_bind where we set the cpus_allowed 
> mask, but do not touch task_thread_info(p)->cpu. The scheduler 
> assumes the previously scheduled CPU is in the cpus_allowed 
> mask, but in this case we are moving a thread to another CPU so 
> it is not.
> 
> We used to call set_task_cpu which sets 
> task_thread_info(p)->cpu (in fact kthread_bind still has a 
> comment suggesting this). That was removed in e2912009fb7b 
> ("sched: Ensure set_task_cpu() is never called on blocked 
> tasks").
> 
> Since we cannot call set_task_cpu (the task is in a sleeping 
> state), just do an explicit set of task_thread_info(p)->cpu.

So we cannot call set_task_cpu() because in the normal life time 
of a task the ->cpu value gets set on wakeup. So if a task is 
blocked right now, and its affinity changes, it ought to get a 
correct ->cpu selected on wakeup. The affinity mask and the 
current value of ->cpu getting out of sync is thus 'normal'.

(Check for example how set_cpus_allowed_ptr() works: we first set 
the new allowed mask, then do we migrate the task away if 
necessary.)

In the kthread_bind() case this is explicitly assumed: it only 
calls do_set_cpus_allowed().

But obviously the bug triggers in kernel/smpboot.c, and that 
assert shows a real bug - and your patch makes the assert go 
away, so the question is, how did the kthread get woken up and 
put on a runqueue without its ->cpu getting set?

One possibility is a generic scheduler bug in ttwu(), resulting 
in ->cpu not getting set properly. If this was the case then 
other places would be blowing up as well, and I don't think we 
are seeing this currently, especially not over such a long 
timespan.

Another possibility would be that kthread_bind()'s assumption 
that the task is inactive is false: if the task activates when we 
think it's blocked and we just hotplug-migrate it away while its 
running (setting its td->cpu?), the assert could trigger I think 
- and the patch would make the assert go away.

A third possibility would be, if this is a freshly created 
thread, some sort of initialization race - either in the kthread 
or in the scheduler code.

Weird.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
  2014-12-08  8:34 ` Ingo Molnar
@ 2014-12-08 10:18   ` Anton Blanchard
  2014-12-08 23:58     ` [PATCH] powerpc: secondary CPUs signal to master before setting active and online " Anton Blanchard
  0 siblings, 1 reply; 11+ messages in thread
From: Anton Blanchard @ 2014-12-08 10:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: torvalds, akpm, peterz, tglx, mingo, rostedt, tj, fengguang.wu,
	rafael.j.wysocki, yuyang.du, lkp, yuanhan.liu, pjt, bsegall,
	daniel, subbaram, computersforpeace, sp, linux-kernel,
	linuxppc-dev


Hi Ingo,

> So we cannot call set_task_cpu() because in the normal life time 
> of a task the ->cpu value gets set on wakeup. So if a task is 
> blocked right now, and its affinity changes, it ought to get a 
> correct ->cpu selected on wakeup. The affinity mask and the 
> current value of ->cpu getting out of sync is thus 'normal'.
> 
> (Check for example how set_cpus_allowed_ptr() works: we first set 
> the new allowed mask, then do we migrate the task away if 
> necessary.)
> 
> In the kthread_bind() case this is explicitly assumed: it only 
> calls do_set_cpus_allowed().
> 
> But obviously the bug triggers in kernel/smpboot.c, and that 
> assert shows a real bug - and your patch makes the assert go 
> away, so the question is, how did the kthread get woken up and 
> put on a runqueue without its ->cpu getting set?

I started going down this line earlier today, and found things like:

select_task_rq_fair:

        if (p->nr_cpus_allowed == 1)
                return prev_cpu;

I tried returning cpumask_first(tsk_cpus_allowed()) instead, and while
I couldn't hit the BUG I did manage to get a scheduler lockup during
testing.

At that point I thought the previous task_cpu() was somewhat ingrained
in the scheduler and came up with the patch. If not, we could go on a
hunt to see what else needs fixing.

Anton

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
  2014-12-08  3:27 [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!) Anton Blanchard
  2014-12-08  4:28 ` Linus Torvalds
  2014-12-08  8:34 ` Ingo Molnar
@ 2014-12-08 13:54 ` Steven Rostedt
  2014-12-09  2:24   ` Lai Jiangshan
  2 siblings, 1 reply; 11+ messages in thread
From: Steven Rostedt @ 2014-12-08 13:54 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: torvalds, akpm, peterz, tglx, mingo, tj, fengguang.wu,
	rafael.j.wysocki, yuyang.du, lkp, yuanhan.liu, pjt, bsegall,
	daniel, subbaram, computersforpeace, sp, linux-kernel,
	linuxppc-dev

On Mon,  8 Dec 2014 14:27:01 +1100
Anton Blanchard <anton@samba.org> wrote:

> I have a busy ppc64le KVM box where guests sometimes hit the infamous
> "kernel BUG at kernel/smpboot.c:134!" issue during boot:
> 
> BUG_ON(td->cpu != smp_processor_id());
> 
> Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
> output confirms it:
> 
> CPU: 0
> Comm: watchdog/130
> 
> The issue is in kthread_bind where we set the cpus_allowed mask, but do
> not touch task_thread_info(p)->cpu. The scheduler assumes the previously
> scheduled CPU is in the cpus_allowed mask, but in this case we are
> moving a thread to another CPU so it is not.
> 

Does this happen always on boot up, and always with the watchdog thread?

I followed the logic that starts the watchdog threads.

watchdog_enable_all_cpus()
  smpboot_register_percpu-thread() {

    for_each_online_cpu(cpu) { ... }

Where watchdog_enable_all_cpus() can be called by
lockup_detector_init() before SMP is started, but also by
proc_dowatchdog() which is called by the sysctl commands (after SMP is
up and running).

I noticed there's no "get_online_cpus()" anywhere, although the
unregister_percpu_thread() has it. Is it possible that we created a
thread on a CPU that wasn't fully online yet?

Perhaps the following patch is needed? Even if this isn't the solution
to this bug, it is probably needed as watchdog_enable_all_cpus() can be
called after boot up too.

-- Steve

diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index eb89e1807408..60d35ac5d3f1 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -279,6 +279,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
 	unsigned int cpu;
 	int ret = 0;
 
+	get_online_cpus();
 	mutex_lock(&smpboot_threads_lock);
 	for_each_online_cpu(cpu) {
 		ret = __smpboot_create_thread(plug_thread, cpu);
@@ -291,6 +292,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
 	list_add(&plug_thread->list, &hotplug_threads);
 out:
 	mutex_unlock(&smpboot_threads_lock);
+	put_online_cpus();
 	return ret;
 }
 EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread);

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] powerpc: secondary CPUs signal to master before setting active and online (fixes kernel BUG at kernel/smpboot.c:134!)
  2014-12-08 10:18   ` Anton Blanchard
@ 2014-12-08 23:58     ` Anton Blanchard
  2014-12-09 20:54       ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Anton Blanchard @ 2014-12-08 23:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: torvalds, akpm, peterz, tglx, mingo, rostedt, tj, fengguang.wu,
	rafael.j.wysocki, yuyang.du, lkp, yuanhan.liu, pjt, bsegall,
	daniel, subbaram, computersforpeace, sp, linux-kernel,
	linuxppc-dev, benh, paulus, mpe

Hi Ingo,

> At that point I thought the previous task_cpu() was somewhat ingrained
> in the scheduler and came up with the patch. If not, we could go on a
> hunt to see what else needs fixing.

I had another look. The scheduled does indeed make assumptions about the
previous task_cpu, but we have a hammer to fix it up called
select_fallback_rq.

I annotated select_fallback_rq, and did hit a case where the CPU was
not active. ppc64 patch below.

I think x86 have a similar (although harder to hit) issue. While it
does wait for the cpu_online bit to be set:

        while (!cpu_online(cpu)) {
                cpu_relax();
                touch_nmi_watchdog();
        }

The cpu_active bit is set after the cpu_online bit:

void set_cpu_online(unsigned int cpu, bool online)
{
        if (online) {
                cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
                cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));

If the CPU got delayed between the two stores (eg a KVM guest had the CPU
scheduled out), then we'd end up with cpu_active unset and hit the same
issue in select_fallback_rq.

Anton
--

I have a busy ppc64le KVM box where guests sometimes hit the infamous
"kernel BUG at kernel/smpboot.c:134!" issue during boot:

BUG_ON(td->cpu != smp_processor_id());

Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
output confirms it:

CPU: 0
Comm: watchdog/130

The problem is that we aren't ensuring the CPU active and online bits are set
before allowing the master to continue on. The master unparks the secondary
CPUs kthreads and the scheduler looks for a CPU to run on. It calls
select_task_rq and realises the suggested CPU is not in the cpus_allowed
mask. It then ends up in select_fallback_rq, and since the active and
online bits aren't set we choose some other CPU to run on.

Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
---
 arch/powerpc/kernel/smp.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 71e186d..d40e46e 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -700,7 +700,6 @@ void start_secondary(void *unused)
 	smp_store_cpu_info(cpu);
 	set_dec(tb_ticks_per_jiffy);
 	preempt_disable();
-	cpu_callin_map[cpu] = 1;
 
 	if (smp_ops->setup_cpu)
 		smp_ops->setup_cpu(cpu);
@@ -739,6 +738,14 @@ void start_secondary(void *unused)
 	notify_cpu_starting(cpu);
 	set_cpu_online(cpu, true);
 
+	/*
+	 * CPU must be marked active and online before we signal back to the
+	 * master, because the scheduler needs to see the cpu_online and
+	 * cpu_active bits set.
+	 */
+	smp_wmb();
+	cpu_callin_map[cpu] = 1;
+
 	local_irq_enable();
 
 	cpu_startup_entry(CPUHP_ONLINE);
-- 
2.1.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
  2014-12-08 13:54 ` [PATCH] kthread: kthread_bind fails to enforce CPU affinity " Steven Rostedt
@ 2014-12-09  2:24   ` Lai Jiangshan
  0 siblings, 0 replies; 11+ messages in thread
From: Lai Jiangshan @ 2014-12-09  2:24 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Anton Blanchard, torvalds, akpm, peterz, tglx, mingo, tj,
	fengguang.wu, rafael.j.wysocki, yuyang.du, lkp, yuanhan.liu, pjt,
	bsegall, daniel, subbaram, computersforpeace, sp, linux-kernel,
	linuxppc-dev

On 12/08/2014 09:54 PM, Steven Rostedt wrote:
> On Mon,  8 Dec 2014 14:27:01 +1100
> Anton Blanchard <anton@samba.org> wrote:
> 
>> I have a busy ppc64le KVM box where guests sometimes hit the infamous
>> "kernel BUG at kernel/smpboot.c:134!" issue during boot:
>>
>> BUG_ON(td->cpu != smp_processor_id());
>>
>> Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
>> output confirms it:
>>
>> CPU: 0
>> Comm: watchdog/130
>>
>> The issue is in kthread_bind where we set the cpus_allowed mask, but do
>> not touch task_thread_info(p)->cpu. The scheduler assumes the previously
>> scheduled CPU is in the cpus_allowed mask, but in this case we are
>> moving a thread to another CPU so it is not.
>>
> 
> Does this happen always on boot up, and always with the watchdog thread?
> 
> I followed the logic that starts the watchdog threads.
> 
> watchdog_enable_all_cpus()
>   smpboot_register_percpu-thread() {
> 
>     for_each_online_cpu(cpu) { ... }
> 
> Where watchdog_enable_all_cpus() can be called by
> lockup_detector_init() before SMP is started, but also by
> proc_dowatchdog() which is called by the sysctl commands (after SMP is
> up and running).
> 
> I noticed there's no "get_online_cpus()" anywhere, although the
> unregister_percpu_thread() has it. Is it possible that we created a
> thread on a CPU that wasn't fully online yet?
> 
> Perhaps the following patch is needed? Even if this isn't the solution
> to this bug, it is probably needed as watchdog_enable_all_cpus() can be
> called after boot up too.
> 
> -- Steve


Hi, Steven, tglx

See this https://lkml.org/lkml/2014/7/30/804
"[PATCH] smpboot: add missing get_online_cpus() when register"


Thanks,
Lai

> 
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index eb89e1807408..60d35ac5d3f1 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -279,6 +279,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
>  	unsigned int cpu;
>  	int ret = 0;
>  
> +	get_online_cpus();
>  	mutex_lock(&smpboot_threads_lock);
>  	for_each_online_cpu(cpu) {
>  		ret = __smpboot_create_thread(plug_thread, cpu);
> @@ -291,6 +292,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
>  	list_add(&plug_thread->list, &hotplug_threads);
>  out:
>  	mutex_unlock(&smpboot_threads_lock);
> +	put_online_cpus();
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> .
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc: secondary CPUs signal to master before setting active and online (fixes kernel BUG at kernel/smpboot.c:134!)
  2014-12-08 23:58     ` [PATCH] powerpc: secondary CPUs signal to master before setting active and online " Anton Blanchard
@ 2014-12-09 20:54       ` Linus Torvalds
  2014-12-10 14:08         ` Thomas Gleixner
  2014-12-10 23:06         ` Michael Ellerman
  0 siblings, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2014-12-09 20:54 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Ingo Molnar, Andrew Morton, Peter Zijlstra, Thomas Gleixner,
	Ingo Molnar, Steven Rostedt, Tejun Heo, Wu Fengguang,
	Rafael J. Wysocki, yuyang.du, lkp, Yuanhan Liu, Paul Turner,
	Benjamin Segall, Daniel J Blueman, subbaram, Brian Norris,
	Slava Pestov, Linux Kernel Mailing List, ppc-dev,
	Benjamin Herrenschmidt, Paul Mackerras, mpe

On Mon, Dec 8, 2014 at 3:58 PM, Anton Blanchard <anton@samba.org> wrote:
> Hi Ingo,
>
>> At that point I thought the previous task_cpu() was somewhat ingrained
>> in the scheduler and came up with the patch. If not, we could go on a
>> hunt to see what else needs fixing.
>
> I had another look. The scheduled does indeed make assumptions about the
> previous task_cpu, but we have a hammer to fix it up called
> select_fallback_rq.
>
> I annotated select_fallback_rq, and did hit a case where the CPU was
> not active. ppc64 patch below.

Anton, I'll assume I will get this through the usual powerpc pull requests?

> I think x86 have a similar (although harder to hit) issue.

Ingo?

                         Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc: secondary CPUs signal to master before setting active and online (fixes kernel BUG at kernel/smpboot.c:134!)
  2014-12-09 20:54       ` Linus Torvalds
@ 2014-12-10 14:08         ` Thomas Gleixner
  2014-12-10 23:06         ` Michael Ellerman
  1 sibling, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2014-12-10 14:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Anton Blanchard, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Ingo Molnar, Steven Rostedt, Tejun Heo, Wu Fengguang,
	Rafael J. Wysocki, yuyang.du, lkp, Yuanhan Liu, Paul Turner,
	Benjamin Segall, Daniel J Blueman, subbaram, Brian Norris,
	Slava Pestov, Linux Kernel Mailing List, ppc-dev,
	Benjamin Herrenschmidt, Paul Mackerras, mpe

On Tue, 9 Dec 2014, Linus Torvalds wrote:
> On Mon, Dec 8, 2014 at 3:58 PM, Anton Blanchard <anton@samba.org> wrote:
> > Hi Ingo,
> >
> >> At that point I thought the previous task_cpu() was somewhat ingrained
> >> in the scheduler and came up with the patch. If not, we could go on a
> >> hunt to see what else needs fixing.
> >
> > I had another look. The scheduled does indeed make assumptions about the
> > previous task_cpu, but we have a hammer to fix it up called
> > select_fallback_rq.
> >
> > I annotated select_fallback_rq, and did hit a case where the CPU was
> > not active. ppc64 patch below.
> 
> Anton, I'll assume I will get this through the usual powerpc pull requests?
> 
> > I think x86 have a similar (although harder to hit) issue.

Indeed way harder to hit:

CPU 0				CPU 1

    				set_cpu_online(1, true) {
while (!cpu_online(cpu1))	   cpumask_set_cpu(1, to_cpumask(cpu_online_bits));
      relax();

wakeup_thread_on_cpu1();
				   cpumask_set_cpu(1, to_cpumask(cpu_active_bits));

On bare metal probably impossible, but on virt it should be
observable. Fix is simple.

Thanks,

	tglx

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 668d8f2a8781..534f3384f03f 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -222,7 +222,6 @@ static void notrace start_secondary(void *unused)
 	lock_vector_lock();
 	set_cpu_online(smp_processor_id(), true);
 	unlock_vector_lock();
-	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
 	x86_platform.nmi_init();
 
 	/* enable local interrupts */
@@ -234,6 +233,7 @@ static void notrace start_secondary(void *unused)
 	x86_cpuinit.setup_percpu_clockev();
 
 	wmb();
+	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
 	cpu_startup_entry(CPUHP_ONLINE);
 }
 
@@ -932,7 +932,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 	check_tsc_sync_source(cpu);
 	local_irq_restore(flags);
 
-	while (!cpu_online(cpu)) {
+	while (per_cpu(cpu_state,cpu) != CPU_ONLINE) {
 		cpu_relax();
 		touch_nmi_watchdog();
 	}



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] powerpc: secondary CPUs signal to master before setting active and online (fixes kernel BUG at kernel/smpboot.c:134!)
  2014-12-09 20:54       ` Linus Torvalds
  2014-12-10 14:08         ` Thomas Gleixner
@ 2014-12-10 23:06         ` Michael Ellerman
  1 sibling, 0 replies; 11+ messages in thread
From: Michael Ellerman @ 2014-12-10 23:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Anton Blanchard, Ingo Molnar, Andrew Morton, Peter Zijlstra,
	Thomas Gleixner, Ingo Molnar, Steven Rostedt, Tejun Heo,
	Wu Fengguang, Rafael J. Wysocki, yuyang.du, lkp, Yuanhan Liu,
	Paul Turner, Benjamin Segall, Daniel J Blueman, subbaram,
	Brian Norris, Slava Pestov, Linux Kernel Mailing List, ppc-dev,
	Benjamin Herrenschmidt, Paul Mackerras

On Tue, 2014-12-09 at 12:54 -0800, Linus Torvalds wrote:
> On Mon, Dec 8, 2014 at 3:58 PM, Anton Blanchard <anton@samba.org> wrote:
> > Hi Ingo,
> >
> >> At that point I thought the previous task_cpu() was somewhat ingrained
> >> in the scheduler and came up with the patch. If not, we could go on a
> >> hunt to see what else needs fixing.
> >
> > I had another look. The scheduled does indeed make assumptions about the
> > previous task_cpu, but we have a hammer to fix it up called
> > select_fallback_rq.
> >
> > I annotated select_fallback_rq, and did hit a case where the CPU was
> > not active. ppc64 patch below.
> 
> Anton, I'll assume I will get this through the usual powerpc pull requests?

Yeah I'll put it in my tree unless Anton objects.

cheers



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-12-10 23:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-08  3:27 [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!) Anton Blanchard
2014-12-08  4:28 ` Linus Torvalds
2014-12-08  4:46   ` Anton Blanchard
2014-12-08  8:34 ` Ingo Molnar
2014-12-08 10:18   ` Anton Blanchard
2014-12-08 23:58     ` [PATCH] powerpc: secondary CPUs signal to master before setting active and online " Anton Blanchard
2014-12-09 20:54       ` Linus Torvalds
2014-12-10 14:08         ` Thomas Gleixner
2014-12-10 23:06         ` Michael Ellerman
2014-12-08 13:54 ` [PATCH] kthread: kthread_bind fails to enforce CPU affinity " Steven Rostedt
2014-12-09  2:24   ` Lai Jiangshan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).