LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH v1] CPU hotplug: active_reader not woken up in some cases - deadlock
@ 2014-12-08 18:13 David Hildenbrand
  2014-12-08 18:15 ` David Hildenbrand
  2014-12-08 18:31 ` Paul E. McKenney
  0 siblings, 2 replies; 7+ messages in thread
From: David Hildenbrand @ 2014-12-08 18:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: heiko.carstens, dahi, borntraeger, rafael.j.wysocki, paulmck,
	peterz, srivatsa.bhat, oleg, bp, jkosina

Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited
grace periods") introduced another problem that can easily be reproduced by
starting/stopping cpus in a loop.

E.g.:
  for i in `seq 5000`; do
      echo 1 > /sys/devices/system/cpu/cpu1/online
      echo 0 > /sys/devices/system/cpu/cpu1/online
  done

Will result in:
  INFO: task /cpu_start_stop:1 blocked for more than 120 seconds.
  Call Trace:
  ([<00000000006a028e>] __schedule+0x406/0x91c)
   [<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4
   [<0000000000130ff6>] _cpu_up+0x3e/0x1c4
   [<0000000000131232>] cpu_up+0xb6/0xd4
   [<00000000004a5720>] device_online+0x80/0xc0
   [<00000000004a57f0>] online_store+0x90/0xb0
  ...

And a deadlock.

Problem is that if the last ref in put_online_cpus() can't get the
cpu_hotplug.lock the puts_pending count is incremented, but a sleeping active_writer
might never be woken up, therefore never exiting the loop in cpu_hotplug_begin().

This quick fix wakes up the active_writer proactively. The writer already
goes back to sleep if the ref count isn't already down to 0, so this should be
fine.

Can't reproduce the error with this fix.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
---
 kernel/cpu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 90a3d01..e77740583 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -117,6 +117,9 @@ void put_online_cpus(void)
 		return;
 	if (!mutex_trylock(&cpu_hotplug.lock)) {
 		atomic_inc(&cpu_hotplug.puts_pending);
+		/* we might be the last one */
+		if (unlikely(cpu_hotplug.active_writer))
+			wake_up_process(cpu_hotplug.active_writer);
 		cpuhp_lock_release();
 		return;
 	}
-- 
1.8.5.5


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] CPU hotplug: active_reader not woken up in some cases - deadlock
  2014-12-08 18:13 [PATCH v1] CPU hotplug: active_reader not woken up in some cases - deadlock David Hildenbrand
@ 2014-12-08 18:15 ` David Hildenbrand
  2014-12-08 18:31 ` Paul E. McKenney
  1 sibling, 0 replies; 7+ messages in thread
From: David Hildenbrand @ 2014-12-08 18:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: heiko.carstens, borntraeger, rafael.j.wysocki, paulmck, peterz,
	srivatsa.bhat, oleg, bp, jkosina

The title should of course say active_writer  ... grml

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] CPU hotplug: active_reader not woken up in some cases - deadlock
  2014-12-08 18:13 [PATCH v1] CPU hotplug: active_reader not woken up in some cases - deadlock David Hildenbrand
  2014-12-08 18:15 ` David Hildenbrand
@ 2014-12-08 18:31 ` Paul E. McKenney
  2014-12-08 18:58   ` David Hildenbrand
  1 sibling, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2014-12-08 18:31 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, heiko.carstens, borntraeger, rafael.j.wysocki,
	peterz, srivatsa.bhat, oleg, bp, jkosina

On Mon, Dec 08, 2014 at 07:13:03PM +0100, David Hildenbrand wrote:
> Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited
> grace periods") introduced another problem that can easily be reproduced by
> starting/stopping cpus in a loop.
> 
> E.g.:
>   for i in `seq 5000`; do
>       echo 1 > /sys/devices/system/cpu/cpu1/online
>       echo 0 > /sys/devices/system/cpu/cpu1/online
>   done
> 
> Will result in:
>   INFO: task /cpu_start_stop:1 blocked for more than 120 seconds.
>   Call Trace:
>   ([<00000000006a028e>] __schedule+0x406/0x91c)
>    [<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4
>    [<0000000000130ff6>] _cpu_up+0x3e/0x1c4
>    [<0000000000131232>] cpu_up+0xb6/0xd4
>    [<00000000004a5720>] device_online+0x80/0xc0
>    [<00000000004a57f0>] online_store+0x90/0xb0
>   ...
> 
> And a deadlock.
> 
> Problem is that if the last ref in put_online_cpus() can't get the
> cpu_hotplug.lock the puts_pending count is incremented, but a sleeping active_writer
> might never be woken up, therefore never exiting the loop in cpu_hotplug_begin().
> 
> This quick fix wakes up the active_writer proactively. The writer already
> goes back to sleep if the ref count isn't already down to 0, so this should be
> fine.
> 
> Can't reproduce the error with this fix.

Good catch!

But don't we need to use exactly the same value for the NULL check
and for the wakeup?  Otherwise, wouldn't it be possible for
cpu_hotplug.active_writer to be non-NULL for the check but NULL
for the wake_up_process()?

							Thanx, Paul

> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
> ---
>  kernel/cpu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 90a3d01..e77740583 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -117,6 +117,9 @@ void put_online_cpus(void)
>  		return;
>  	if (!mutex_trylock(&cpu_hotplug.lock)) {
>  		atomic_inc(&cpu_hotplug.puts_pending);
> +		/* we might be the last one */
> +		if (unlikely(cpu_hotplug.active_writer))
> +			wake_up_process(cpu_hotplug.active_writer);
>  		cpuhp_lock_release();
>  		return;
>  	}
> -- 
> 1.8.5.5
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] CPU hotplug: active_reader not woken up in some cases - deadlock
  2014-12-08 18:31 ` Paul E. McKenney
@ 2014-12-08 18:58   ` David Hildenbrand
  2014-12-08 19:08     ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2014-12-08 18:58 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, heiko.carstens, borntraeger, rafael.j.wysocki,
	peterz, srivatsa.bhat, oleg, bp, jkosina

> On Mon, Dec 08, 2014 at 07:13:03PM +0100, David Hildenbrand wrote:
> > Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited
> > grace periods") introduced another problem that can easily be reproduced by
> > starting/stopping cpus in a loop.
> > 
> > E.g.:
> >   for i in `seq 5000`; do
> >       echo 1 > /sys/devices/system/cpu/cpu1/online
> >       echo 0 > /sys/devices/system/cpu/cpu1/online
> >   done
> > 
> > Will result in:
> >   INFO: task /cpu_start_stop:1 blocked for more than 120 seconds.
> >   Call Trace:
> >   ([<00000000006a028e>] __schedule+0x406/0x91c)
> >    [<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4
> >    [<0000000000130ff6>] _cpu_up+0x3e/0x1c4
> >    [<0000000000131232>] cpu_up+0xb6/0xd4
> >    [<00000000004a5720>] device_online+0x80/0xc0
> >    [<00000000004a57f0>] online_store+0x90/0xb0
> >   ...
> > 
> > And a deadlock.
> > 
> > Problem is that if the last ref in put_online_cpus() can't get the
> > cpu_hotplug.lock the puts_pending count is incremented, but a sleeping active_writer
> > might never be woken up, therefore never exiting the loop in cpu_hotplug_begin().
> > 
> > This quick fix wakes up the active_writer proactively. The writer already
> > goes back to sleep if the ref count isn't already down to 0, so this should be
> > fine.
> > 
> > Can't reproduce the error with this fix.
> 
> Good catch!
> 
> But don't we need to use exactly the same value for the NULL check
> and for the wakeup?  Otherwise, wouldn't it be possible for
> cpu_hotplug.active_writer to be non-NULL for the check but NULL
> for the wake_up_process()?
> 
> 							Thanx, Paul

active_writer is cleared while holding cpuhp_lock, so this should be safe,
right?

Thanks!


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] CPU hotplug: active_reader not woken up in some cases - deadlock
  2014-12-08 18:58   ` David Hildenbrand
@ 2014-12-08 19:08     ` Paul E. McKenney
  2014-12-08 19:30       ` David Hildenbrand
  0 siblings, 1 reply; 7+ messages in thread
From: Paul E. McKenney @ 2014-12-08 19:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, heiko.carstens, borntraeger, rafael.j.wysocki,
	peterz, srivatsa.bhat, oleg, bp, jkosina

On Mon, Dec 08, 2014 at 07:58:14PM +0100, David Hildenbrand wrote:
> > On Mon, Dec 08, 2014 at 07:13:03PM +0100, David Hildenbrand wrote:
> > > Commit b2c4623dcd07 ("rcu: More on deadlock between CPU hotplug and expedited
> > > grace periods") introduced another problem that can easily be reproduced by
> > > starting/stopping cpus in a loop.
> > > 
> > > E.g.:
> > >   for i in `seq 5000`; do
> > >       echo 1 > /sys/devices/system/cpu/cpu1/online
> > >       echo 0 > /sys/devices/system/cpu/cpu1/online
> > >   done
> > > 
> > > Will result in:
> > >   INFO: task /cpu_start_stop:1 blocked for more than 120 seconds.
> > >   Call Trace:
> > >   ([<00000000006a028e>] __schedule+0x406/0x91c)
> > >    [<0000000000130f60>] cpu_hotplug_begin+0xd0/0xd4
> > >    [<0000000000130ff6>] _cpu_up+0x3e/0x1c4
> > >    [<0000000000131232>] cpu_up+0xb6/0xd4
> > >    [<00000000004a5720>] device_online+0x80/0xc0
> > >    [<00000000004a57f0>] online_store+0x90/0xb0
> > >   ...
> > > 
> > > And a deadlock.
> > > 
> > > Problem is that if the last ref in put_online_cpus() can't get the
> > > cpu_hotplug.lock the puts_pending count is incremented, but a sleeping active_writer
> > > might never be woken up, therefore never exiting the loop in cpu_hotplug_begin().
> > > 
> > > This quick fix wakes up the active_writer proactively. The writer already
> > > goes back to sleep if the ref count isn't already down to 0, so this should be
> > > fine.
> > > 
> > > Can't reproduce the error with this fix.
> > 
> > Good catch!
> > 
> > But don't we need to use exactly the same value for the NULL check
> > and for the wakeup?  Otherwise, wouldn't it be possible for
> > cpu_hotplug.active_writer to be non-NULL for the check but NULL
> > for the wake_up_process()?
> > 
> > 							Thanx, Paul
> 
> active_writer is cleared while holding cpuhp_lock, so this should be safe,
> right?

You lost me on that one.  Don't we get to that piece of code precisely
because we don't hold any of the CPU-hotplug locks?  If so, the
writer might well hold all the locks it needs, and might well change
cpu_hotplug.active_writer out from under us.

What am I missing here?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] CPU hotplug: active_reader not woken up in some cases - deadlock
  2014-12-08 19:08     ` Paul E. McKenney
@ 2014-12-08 19:30       ` David Hildenbrand
  2014-12-08 20:47         ` Paul E. McKenney
  0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2014-12-08 19:30 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, heiko.carstens, borntraeger, rafael.j.wysocki,
	peterz, srivatsa.bhat, oleg, bp, jkosina

> > active_writer is cleared while holding cpuhp_lock, so this should be safe,
> > right?
> 
> You lost me on that one.  Don't we get to that piece of code precisely
> because we don't hold any of the CPU-hotplug locks?  If so, the
> writer might well hold all the locks it needs, and might well change
> cpu_hotplug.active_writer out from under us.
> 
> What am I missing here?
> 
> 							Thanx, Paul

I was missing that cpuhp_lock_* are simply lockdep anotations ... it's
getting late :)

So you're right, we need to verify that we don't get a 0 on the second access.

Will send an updated version soon.

Thanks!


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] CPU hotplug: active_reader not woken up in some cases - deadlock
  2014-12-08 19:30       ` David Hildenbrand
@ 2014-12-08 20:47         ` Paul E. McKenney
  0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2014-12-08 20:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, heiko.carstens, borntraeger, rafael.j.wysocki,
	peterz, srivatsa.bhat, oleg, bp, jkosina

On Mon, Dec 08, 2014 at 08:30:18PM +0100, David Hildenbrand wrote:
> > > active_writer is cleared while holding cpuhp_lock, so this should be safe,
> > > right?
> > 
> > You lost me on that one.  Don't we get to that piece of code precisely
> > because we don't hold any of the CPU-hotplug locks?  If so, the
> > writer might well hold all the locks it needs, and might well change
> > cpu_hotplug.active_writer out from under us.
> > 
> > What am I missing here?
> > 
> > 							Thanx, Paul
> 
> I was missing that cpuhp_lock_* are simply lockdep anotations ... it's
> getting late :)
> 
> So you're right, we need to verify that we don't get a 0 on the second access.

All you should need to do is to do something like this:

	awp = ACCESS_ONCE(cpu_hotplug.active_writer);
	if (awp)
		wake_up_process(cpu_hotplug.active_writer);

That way you only have one access, and the check and wake_up_process()
are guaranteed to be consistent.

> Will send an updated version soon.

Sounds good!

							Thanx, Paul


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-12-08 23:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-08 18:13 [PATCH v1] CPU hotplug: active_reader not woken up in some cases - deadlock David Hildenbrand
2014-12-08 18:15 ` David Hildenbrand
2014-12-08 18:31 ` Paul E. McKenney
2014-12-08 18:58   ` David Hildenbrand
2014-12-08 19:08     ` Paul E. McKenney
2014-12-08 19:30       ` David Hildenbrand
2014-12-08 20:47         ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).